Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of XML and Semistructured Data

Similar presentations


Presentation on theme: "Management of XML and Semistructured Data"— Presentation transcript:

1 Management of XML and Semistructured Data
Lecture 2, Wednesday, 4/4/2001

2 Outline Semistructured data XML Simulation Bisimulation Syntax
Data model

3 The Semistructured Data Model
Bib &o1 complex object paper paper book references &o12 &o24 &o29 references references author page author year author title http title author publisher title author author &o43 &25 &96 1997 last firstname atomic object firstname lastname first lastname &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Object Exchange Model (OEM)

4 Syntax for Semistructured Data
May omit oid’s: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } }

5 Set Semantics for Trees
Want to say that {a, a, b} = {a, b} Define equality for trees first, then for graphs Definition Two trees t, t’ are equal, t=t’, if: They are both atomic values with same value t = {t1, ..., tm}, t’ = {t1’, ..., tn’} and: i=1,...,m, j=1,...,n s.t. ti = tj’ j=1,...,n, i=1,...,m s.t. ti = tj’

6 Set Semantics: Example
b b = a a b c c d c d c c c c c d 1 2 2 1 1 1 1 2 e e e 3 3 3

7 Set Semantics for Graphs
Previous definition does not apply directly to graphs with cycles Need to adapt it  bisimulation First, we will define a simulation

8 Note: if we insist that R be a function  graph homeomorphism
Graph Simulation Definition Two edge-labeled graphs G1, G2 A simulation is a relation R between nodes: if (x1, x2)  R, and (x1,a,y1)  G1, then exists (x2,a,y2)  G2 (same label) s.t. (y1,y2)  R x1 x2 a R G1 G2 y1 a R y2 Note: if we insist that R be a function  graph homeomorphism

9 Graph Bisimulation Definition Two edge-labeled graphs G1, G2
A bisimulation is a relation R between nodes s.t. both R and R-1 are simulations

10 Set Semantics for Semistructured Data
Definition Two rooted graphs G1, G2 are equal if there exists a bisimulation R from G1 to G2 such that (root(G1), root(G2))  R Notation: G1  G2 For trees, this is precisely our earlier definition

11 Examples of Bisimilar Graphs
= c c c a a = a a a a ...

12 Examples of non-Bisimilar Graphs
c b c This is a simulation but not a bisimulation Why ? Notice: G1, G2 have the same sets of paths

13 Examples of Simulation
Simulation acts like “subset” {a, b}  {a, b, c} {a, b:{c}}  {d, a:{e,f}, b:{c,g}} Question: if DB1  DB2 and DB2  DB1 then DB1  DB2 ? c a a b b d a a b b e c g f c

14 Facts About a (Bi)Simulation
The empty set is always a (bi)simulation If R, R’ are (bi)simulations, so is R U R’ Hence, there always exists a maximal (bi)simulation: Checking if DB1=DB2: compute the maximal bisimulation R, then test (root(DB1),root(DB2)) in R

15 Computing a (Bi)Simulation
Computing the maximal (bi)simulation: Start with R = nodes(G1) x nodes(G2) While exists (x1, x2)  R that violates the definition, remove (x1, x2) from R This runs in polynomial time ! Better: O((m+n)log(m+n)) for bisimulation O(m n) for simulation Compare to finding a graph homeomorphism !

16 XML a W3C standard to complement HTML origins: structured text SGML
motivation: HTML describes presentation XML describes content (version 2, 10/2000)

17 From HTML to XML HTML describes the presentation

18 HTML <h1> Bibliography </h1>
<p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

19 XML XML describes the content <bibliography>
<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> </bibliography> XML describes the content

20 XML Terminology tags: book, title, author, …
start tag: <book>, end tag: </book> elements: <book>…<book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags

21 More XML: Attributes <book price = “55” currency = “USD”>
<title> Foundations of Databases </title> <author> Abiteboul </author> <year> 1995 </year> </book> attributes are alternative ways to represent data

22 More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/> </person> <person id=“o123” mother=“o456”><name>John</name> oids and references in XML are just syntax

23 More XML: CDATA Section
Syntax: <![CDATA[ .....any text here...]]> Example: <example> <![CDATA[ some text here </notAtag> <>]]> </example>

24 More XML: Entity References
Syntax: &entityname; Example: <element> this is less than < </element> Some entities: < > & & &apos; " & Unicode char

25 More XML: Processing Instructions
Syntax: <?target argument?> Example: <product> <name> Alarm Clock </name> <?ringBell 20?> <price> </price> </product> What do they mean ?

26 More XML: Comments Syntax <!-- .... Comment text... -->
Yes, they are part of the data model !!!

27 XML Namespaces http://www.w3.org/TR/REC-xml-names (1/99)
name ::= [prefix:]localpart <book xmlns:isbn=“ <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book>

28 XML Namespaces syntactic: <number> , <isbn:number>
semantic: provide URL for schema <tag xmlns:mystyle = “ <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> defined here

29 XML Data Model Several competing models: Document Object Model (DOM):
(2/2001) class hierarchy (node, element, attribute,…) objects have behavior defines API to inspect/modify the document XSL data model Infoset PSV (post schema validation) XML Query data model (next)

30 XML Query Data Model http://www.w3.org/TR/query-datamodel/ 2/2001
Describes XML as a tree, specialized nodes Uses a functional-style notation (think ML)

31 XML Query Data Model Node ::= DocNode | ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode

32 XML Query Data Model Element node (simplified definition):
elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode])  ElemNode QNameValue = means “a tag name” {...} = means “set of...” [...] = means “list of ...”

33 XML Query Data Model Reads: “give me a tag, a set of attributes, a list of elements/values, and I will return an element”

34 XML Query Data Model Example
book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) … <book price = “55” currency = “USD”> <title> Foundations … </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <year> 1995 </year> </book>

35 XML Query Data Model Attribute node:
attrNode : (QNameValue, ValueNode)  AttrNode

36 XML Query Data Model Example
price2 = attrNode(price,string10) string10 = valueNode(…) /* next */ currency3 = attrNode(currency, string11) string11 = valueNode(…) <book price = “55” currency = “USD”> <title> Foundations … </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <year> 1995 </year> </book>

37 XML Query Data Model Value node:
ValueNode = StringValue | BoolValue | FloatValue … stringValue : string  StringValue boolValue : boolean  BoolValue floatValue : float  FloatValue

38 XML Query Data Model Example
price2 = attrNode(price,string10) string10 = valueNode(stringValue(“55”)) currency3 = attrNode(currency, string11) string11 = valueNode(stringValue(“USD”)) title4 = elemNode(title, string9) string9 = valueNode(stringValue(“Foundations…”)) <book price = “55” currency = “USD”> <title> Foundations … </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <year> 1995 </year> </book>

39 XML v.s. Semistructured Data
both described best by a graph both are schema-less, self-describing

40 Similarities and Differences
<person id=“o123”> <name> Alan </name> <age> 42 </age> < > </ > </person> { person: &o123 { name: “Alan”, age: 42, } } <person father=“o123”> … </person> { person: { father: &o123 …} } person name age Alan 42 father similar on trees, different on graphs

41 More Differences XML is ordered, ssd is not
XML can mix text and elements: <talk> Making Java easier to type and easier to type <speaker> Phil Wadler </speaker> </talk> XML has lots of other stuff: entities, processing instructions, comments Very important: these differences make XML data management harder

42 Summary of Data Models semistructured data, XML
data is self-describing, irregular schema embedded with the data


Download ppt "Management of XML and Semistructured Data"

Similar presentations


Ads by Google