Download presentation
Presentation is loading. Please wait.
1
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
2
2 Structure in Data Representation Relational data is highly structured structure is defined by the schema good for system design good for precise query semantics / answers Structure can be limiting data exchange hard: integration of diff schema authoring is constrained: schema-first querying constrained: must know schema changes to structure not easy
3
3 Data Integration 1. Find all departments whose total employee salaries exceed 1% of the budget of the company. US Europe Asia Australia Internet 2. Find names of employees with the top sales record last month.
4
4 WWW Structured data - Databases Unstructured Text - Documents Semistructured Data Integration of Text and Structured Data
5
5 Need for A New Data Model Loose (and rich) structure Integration of structured, but heterogeneous data sources Evolving, unknown, or irregular structure Textual data with tags and links Combination of data models 5
6
6 XML: Universal Data Exchange Format XML is the confluence of many factors: Databases needed a more flexible interchange format. Data needed to be generated and consumed by applications. The Web needed a more declarative format for data. Documents needed a mechanism for extended tags. XML was originally proposed for online publishing, is becoming the wire format for data exchange. W3C Recommendation: http://www.w3.org/TR/REC-xml/
7
7 From HTML to XML HTML describes the presentation.
8
8 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999
9
9 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content!
10
10 XML: Syntax & Typing
11
11 XML Syntax Tags: book, title, author, … start tag: end tag: Elements: …, … elements are nested empty element:, abbrv. An XML document: single root element An XML document is well formed if it has matching tags
12
12 XML Syntax Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 Attributes are alternative ways to represent data.
13
13 XML Syntax Jane Mary John Jane Mary John Oids and references in XML are just syntax.
14
14 XML Semantics: a Tree ! Mary Maple 345 Seattle John Thailand 23456 Mary Maple 345 Seattle John Thailand 23456 data Mary person name addres s name addres s streetnocity Maple345 Seattle John Thai phone 23456 idid o555 Element node Text node Attribute node Order matters ! IDREF will turn it to a graph.
15
15 XML Data XML is self-describing Schema elements become part of the data – Relational schema: persons(name,phone) – In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible Some real data: http://www.cs.washington.edu/research/xmldatasets/
16
16 Relational Data as XML John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John” 3634“Sue” “Dick” 6343 6363 person XML: person namephone John3634 Sue6343 Dick6363
17
17 XML is Semi-structured Data Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe ← no phone ! namephone John1234 Joe-
18
18 XML is Semi-structured Data Repeated attributes Impossible in tables: nested collections (non 1NF) Mary 2345 3456 Mary 2345 3456 ← two phones ! namephone Mary23453456 ?? ?
19
19 XML is Semi-structured Data Attributes with different types in different objects Mixed content: – contains both s and s John Smith 1234 M. Carey 3456 John Smith 1234 M. Carey 3456 ← structured name ! ← unstructured name !
20
20 Data Typing in XML Data typing in the relational model: schema Data typing in XML – Much more complex – Typing restricts valid trees that can occur theoretical foundation: tree languages – Practical methods: DTD (Document Type Definition) XML Schema
21
21 Document Type Definitions ( DTD ) Part of the original XML specification To be replaced by XML Schema – Much more complex An XML document may have a DTD XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it Validation is useful in data exchange
22
22 DTD Example <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>
23
23 DTD Example 123456789 John B432 1234 987654321 Jim B123... 123456789 John B432 1234 987654321 Jim B123... Example of valid XML document:
24
24 DTD: The Content Model Content model: – Complex = a regular expression over other elements – Text-only = #PCDATA – Empty = EMPTY – Any = ANY – Mixed content = (#PCDATA | A | B | C)* content model
25
25 DTD: Regular Expressions <!ELEMENT name (firstName, lastName)).......... <!ELEMENT name (firstName?, lastName)) DTDXML <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone|email))) Kleene star alternation......................
26
26 Attributes in DTDs..............
27
27 Attributes in DTDs <!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>....... <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>.......
28
28 Attributes in DTDs Types: CDATA= string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday | Wednesday | Friday) = enumeration
29
29 Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed
30
30 Using DTDs Must include in the XML document Either include the entire DTD: – Or include a reference to it: – Or mix the two... (e.g. to override the external definition)
31
31 XML Schema DTDs capture grammatical structure, but have some drawbacks: Not themselves in XML, inconvenient to build tools Don’t capture database datatypes’ domains No way of defining OO-like inheritance… XML Schema addresses shortcomings of DTDs XML syntax Subclassing Domains and built-in datatypes nin. and max # of occurrences of elements http://www.w3.org/XML/Schema
32
32 Basics of XML Schema Need to use the XML Schema namespace (generally named xsd) simpleTypes are a way of restricting domains on scalars Can define a simpleType based on integer, with values within a particular range complexTypes are a way of defining element structures Basically equivalent to !ELEMENT, but more powerful Specify sequence, choice between child elements Specify minOccurs and maxOccurs (default 1) Must associate an element/attribute with a simpleType, or an element with a complexType
33
33 Simple Schema Example
34
34 Questions
35
35 How the Web was Yesterday HTML documents often generated by applications consumed by humans only easy access: across platforms, across organizations No application interoperability: HTML not understood by applications Database technology: client-server
36
36 Application Interoperability Purchase order Amazon Supplier1 Supplier2 Supplier3 Internet
37
37 Semi-structured data Structure may be: irregular implicit partial unknown 37
38
38 Examples Bibtex file Web data Integrated data sources The integration of structured data sources can result in semi-structured data. 38
39
39 New Universal Data Exchange Format: XML A recommendation from the W3C XML = data XML generated by applications XML consumed by applications Easy access: across platforms, organizations
40
40 XML A W3C standard to complement HTML Origins: Structured text SGML Large-scale electronic publishing Data exchange on the web Motivation: HTML describes presentation XML describes content http://www.w3.org/TR/2000/REC-xml-20001006 (version 2, 10/2000)
41
41 Paradigm Shift on the Web From documents (HTML) to data (XML) From information retrieval to data management For databases, also a paradigm shift: from relational model to XML model from data processing to data/query translation from storage to transport
42
42 Database Issues How are we going to model XML? (graphs). Compared to relational model, XML is hierarchical XML allows missing or additional attributes XML allows multiple instances of an attribute (set-valued) XML allows different types in different objects XML integrates structure and text data … How are we going to query XML? (XQuery) How are we going to store XML (in a relational database? object-oriented? native?) How are we going to process XML efficiently? (many interesting research questions!)
43
43 Designing an XML Schema/DTD Not as formalized as relational data design We can still use ER diagrams to break into entity, relationship sets Note that often we already have our data in relations and need to design the XML schema to export them! Generally orient the XML tree around the “central” objects Big decision: element vs. attribute Element if it has its own properties, or if you *might* have more than one of them Attribute if it is a single property – or perhaps not!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.