Models and languages for semistructured data Bridging documents and databases
Lectures 1. Introduction to data models 2. Query languages for relational databases 3. Models and query languages for object databases 4. Models and query languages for semistructured data, XML 5. Embedded query languages 6. Guest lecture on Object Role Modelling
Why do we like types? zTypes facilitate understanding zTypes enable compact representations zTypes enable query optimisation zTypes facilitate consistency enforcement
Background assumptions for typed data zData stable over time zOrganisational body to control data zExercise: Give an example of a context where these assumptions do not hold
Semistructured data Semistructured data is schemaless and self describing The data and the description of the data are integrated
An example {name: {first: “John”, last: “Smith”}, tel: , “John” “Smith” name tel first last
Another example person name age child &o1&o2 “Eva” 40 “Abel” 20 {person: &o1{name: “Eva”, age: 40, child: &o2}, person: &o2{name: “Abel”, age: 20}} An object identifier, such as &o1, before a structure, binds the object identifier to the identity of that structure. The object identifier can then be used to refer to the structure.
Terminology The following is an ssd-expression: &o1{name: “Eva”, age: 40, child: &o2} Label Value Object identifier
A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book …….
Path expressions A path expression is a sequence of labels: l 1.l 2 …l n A path expression results in a set of nodes Path properties are specified by regular expressions on two levels: on the alphabet of labels and on the alphabet of characters that comprise labels
A path expression biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. biblio.book.author
A path expression biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. biblio.(book l paper).author
Examples of path expressions zbiblio.book.author - authors of books zbiblio.paper.author - authors of papers zbiblio.(book l paper).author - authors of books or papers zbiblio._.author - authors of anything zbiblio._*.author - nodes at the ends of paths starting with biblio, ending with author, and having an arbitrary sequence of labels between
Example of a label pattern z((b l B)ook l (a l A)uthor) (s)? - book, Book, author, Author, books, Books, authors, Authors
An exercise biblio._*.author.(“[s l S]ection”) Which ones of the following paths match the path expression above? 1. Biblio.author.Section 2. Biblio.cat.rat.hat.author.section 3. Biblio.author 4. Biblio.cat.author.section.Section
A simple query Select author: X from biblio.book.author X Result: {author: “Darwin”, author: “Marx”}
A query with a condition select row: X from biblio._ X where “Crick” in X.author Result: {row: {author: “Crick”, author: “Wallace”, date: 1956, title: “The spiral DNA”}, …}
Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z
A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z
A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book …….
Nested queries select row: (select author: Y from X.author Y) from biblio.book X
Three exercises zWhich authors have written a book or a paper in 1992? zWhich authors have written a book together with Jones? zWhich authors have written both a book and a paper?
Expressing relations a b c b d e r1r2 { r1: { row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2} }, r2: { row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2} } }
Expressing relational joins select a: A, d: D fromr1.row X r2.row Y X.a A, X.b B, Y.b B’, Y.d D where B = B’
Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) Label variable biblio book author title date n2 Shakespeare Macbeth1622 db author title date n3 Smith Best of Shakespeare1992 book …….
Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) {author: “Shakespeare”, title: “Best of Shakespeare”}
Turning labels into data select publ: {type: L, author: A} from biblio.L X, X.author A biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db {publ: {type: “paper”, author: “Crick”}, publ: {type: “paper”, author: “Wallace”}, publ: {type: “book”, author: “Darwin”}
An exercise zList all publications in 1992, their types, and titles.
Basic XML syntax XML is a textual representation of data An element is a text bounded by tags John start-tag end-tagcontent element can be abbreviated as
Basic XML syntax Elements may contain subelements John
XML attributes An attribute is defined by a name-value pair within a tag
XML attributes and elements widget 10 widget
XML and ssd-expressions John {person: {name: “John”, tel: ,
XML references John Peter element identifier reference attribute
Document Type Definitions <!DOCTYPE db [ ]>
An exercise on DTDs as schemas a1 b1 a2 b2 a1 b1 c2 d2 a1 b1 Write down a DTD for the data above!
Attributes in DTDs trumpet <!ATTLIST name language CDATA #REQUIRED departmentCDATA #IMPLIED>
Reference attributes in DTDs <!DOCTYPE people [ <!ATTLIST person id ID#REQUIRED bossIDREF#REQUIRED friendsIDREFS#IMPLIED> ]>
An exercise id = “sven” boss = “olle”> Sven Svensson id = “olle” friends = “nils eva”> Olle Olsson id = “pelle” boss = “nils eva”> Per Persson Does this XML element conform to the previous DTD?
Limitations of DTDs as schemas zDTDs impose order zNo base types zThe types of IDREFs cannot be constrained
XSL - extensible stylesheet language t1 a1 a2 t2 a3 a4 t3 a5 a6
Template rules and XSL patterns } Template rule XSL pattern t1 t2 t3
Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z {row: {title: “The spiral DNA”, date: 1956}, {title: “Origin”, date: 1848}, {title: “Kapital”, date: 1860}} select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z
Which authors have written a book or a paper in 1992? select author: X from biblio.(book | paper) Y, Y.author X where Y.date = 1992
Which authors have written a book together with Jones? select author: X from biblio.book Y, Y.author X where “Jones” in Y.author
Which authors have written both a book and a paper? select author: A from biblio.book B, biblio.paper P, B.author A where B.author = P.author select author: A1 from biblio.book B, biblio.paper P, B.author A1, P.author A2 where A1 = A2
List all publications in 1992, their types, and titles. select publ: {type: L, title: T} from biblio.L X, X.title T where X.date = 1992
<!DOCTYPE db [ ]> a1 b1 a2 b2 a1 b1 c2 d2 a1 b1