Download presentation
Presentation is loading. Please wait.
1
Management of XML and Semistructured Data
Lecture 2, Wednesday, 4/4/2001
2
Outline Semistructured data XML Simulation Bisimulation Syntax
Data model
3
The Semistructured Data Model
Bib &o1 complex object paper paper book references &o12 &o24 &o29 references references author page author year author title http title author publisher title author author &o43 &25 &96 1997 last firstname atomic object firstname lastname first lastname &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Object Exchange Model (OEM)
4
Syntax for Semistructured Data
May omit oid’s: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } }
5
Set Semantics for Trees
Want to say that {a, a, b} = {a, b} Define equality for trees first, then for graphs Definition Two trees t, t’ are equal, t=t’, if: They are both atomic values with same value t = {t1, ..., tm}, t’ = {t1’, ..., tn’} and: i=1,...,m, j=1,...,n s.t. ti = tj’ j=1,...,n, i=1,...,m s.t. ti = tj’
6
Set Semantics: Example
b b = a a b c c d c d c c c c c d 1 2 2 1 1 1 1 2 e e e 3 3 3
7
Set Semantics for Graphs
Previous definition does not apply directly to graphs with cycles Need to adapt it bisimulation First, we will define a simulation
8
Note: if we insist that R be a function graph homeomorphism
Graph Simulation Definition Two edge-labeled graphs G1, G2 A simulation is a relation R between nodes: if (x1, x2) R, and (x1,a,y1) G1, then exists (x2,a,y2) G2 (same label) s.t. (y1,y2) R x1 x2 a R G1 G2 y1 a R y2 Note: if we insist that R be a function graph homeomorphism
9
Graph Bisimulation Definition Two edge-labeled graphs G1, G2
A bisimulation is a relation R between nodes s.t. both R and R-1 are simulations
10
Set Semantics for Semistructured Data
Definition Two rooted graphs G1, G2 are equal if there exists a bisimulation R from G1 to G2 such that (root(G1), root(G2)) R Notation: G1 G2 For trees, this is precisely our earlier definition
11
Examples of Bisimilar Graphs
= c c c a a = a a a a ...
12
Examples of non-Bisimilar Graphs
c b c This is a simulation but not a bisimulation Why ? Notice: G1, G2 have the same sets of paths
13
Examples of Simulation
Simulation acts like “subset” {a, b} {a, b, c} {a, b:{c}} {d, a:{e,f}, b:{c,g}} Question: if DB1 DB2 and DB2 DB1 then DB1 DB2 ? c a a b b d a a b b e c g f c
14
Facts About a (Bi)Simulation
The empty set is always a (bi)simulation If R, R’ are (bi)simulations, so is R U R’ Hence, there always exists a maximal (bi)simulation: Checking if DB1=DB2: compute the maximal bisimulation R, then test (root(DB1),root(DB2)) in R
15
Computing a (Bi)Simulation
Computing the maximal (bi)simulation: Start with R = nodes(G1) x nodes(G2) While exists (x1, x2) R that violates the definition, remove (x1, x2) from R This runs in polynomial time ! Better: O((m+n)log(m+n)) for bisimulation O(m n) for simulation Compare to finding a graph homeomorphism !
16
XML a W3C standard to complement HTML origins: structured text SGML
motivation: HTML describes presentation XML describes content (version 2, 10/2000)
17
From HTML to XML HTML describes the presentation
18
HTML <h1> Bibliography </h1>
<p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
19
XML XML describes the content <bibliography>
<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content
20
XML Terminology tags: book, title, author, …
start tag: <book>, end tag: </book> elements: <book>…<book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags
21
More XML: Attributes <book price = “55” currency = “USD”>
<title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> attributes are alternative ways to represent data
22
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/> </person> <person id=“o123” mother=“o456”><name>John</name> oids and references in XML are just syntax
23
More XML: CDATA Section
Syntax: <![CDATA[ .....any text here...]]> Example: <example> <![CDATA[ some text here </notAtag> <>]]> </example>
24
More XML: Entity References
Syntax: &entityname; Example: <element> this is less than < </element> Some entities: < > & & ' ‘ " “ & Unicode char
25
More XML: Processing Instructions
Syntax: <?target argument?> Example: <product> <name> Alarm Clock </name> <?ringBell 20?> <price> </price> </product> What do they mean ?
26
More XML: Comments Syntax <!-- .... Comment text... -->
Yes, they are part of the data model !!!
27
XML Namespaces http://www.w3.org/TR/REC-xml-names (1/99)
name ::= [prefix:]localpart <book xmlns:isbn=“ <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book>
28
XML Namespaces syntactic: <number> , <isbn:number>
semantic: provide URL for schema <tag xmlns:mystyle = “ … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> defined here
29
XML Data Model Several competing models: Document Object Model (DOM):
(2/2001) class hierarchy (node, element, attribute,…) objects have behavior defines API to inspect/modify the document XSL data model Infoset PSV (post schema validation) XML Query data model (next)
30
XML Query Data Model http://www.w3.org/TR/query-datamodel/ 2/2001
Describes XML as a tree, specialized nodes Uses a functional-style notation (think ML)
31
XML Query Data Model Node ::= DocNode | ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode
32
XML Query Data Model Element node (simplified definition):
elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode QNameValue = means “a tag name” {...} = means “set of...” [...] = means “list of ...”
33
XML Query Data Model Reads: “give me a tag, a set of attributes, a list of elements/values, and I will return an element”
34
XML Query Data Model Example
book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) … <book price = “55” currency = “USD”> <title> Foundations … </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <year> 1995 </year> </book>
35
XML Query Data Model Attribute node:
attrNode : (QNameValue, ValueNode) AttrNode
36
XML Query Data Model Example
price2 = attrNode(price,string10) string10 = valueNode(…) /* next */ currency3 = attrNode(currency, string11) string11 = valueNode(…) <book price = “55” currency = “USD”> <title> Foundations … </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <year> 1995 </year> </book>
37
XML Query Data Model Value node:
ValueNode = StringValue | BoolValue | FloatValue … stringValue : string StringValue boolValue : boolean BoolValue floatValue : float FloatValue
38
XML Query Data Model Example
price2 = attrNode(price,string10) string10 = valueNode(stringValue(“55”)) currency3 = attrNode(currency, string11) string11 = valueNode(stringValue(“USD”)) title4 = elemNode(title, string9) string9 = valueNode(stringValue(“Foundations…”)) <book price = “55” currency = “USD”> <title> Foundations … </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <year> 1995 </year> </book>
39
XML v.s. Semistructured Data
both described best by a graph both are schema-less, self-describing
40
Similarities and Differences
<person id=“o123”> <name> Alan </name> <age> 42 </age> < > </ > </person> { person: &o123 { name: “Alan”, age: 42, } } <person father=“o123”> … </person> { person: { father: &o123 …} } person name age Alan 42 father similar on trees, different on graphs
41
More Differences XML is ordered, ssd is not
XML can mix text and elements: <talk> Making Java easier to type and easier to type <speaker> Phil Wadler </speaker> </talk> XML has lots of other stuff: entities, processing instructions, comments Very important: these differences make XML data management harder
42
Summary of Data Models semistructured data, XML
data is self-describing, irregular schema embedded with the data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.