Managing XML and Semistructured Data

Slides:



Advertisements
Similar presentations
17 Apr 2002 XML Syntax: DTDs Andy Clark. Validation of XML Documents XML documents must be well-formed XML documents may be valid – Validation verifies.
Advertisements

1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Managing XML and Semistructured Data
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
XML Structures For Existing Databases Ref: 106.ibm.com/developerworks/xml/library/x-struct/
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
XML (2) DTD Sungchul Hong.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
1/11 ITApplications XML Module Session 3: Document Type Definition (DTD) Part 1.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Copyrighted material John Tullis 3/18/2016 page 1 04/29/00 XML Part 4 John Tullis DePaul Instructor
CH 9 Attribute Declaration 1. Objective What is an attribute Declaring attributes Declaring multiple attribute Alternatives to default attributes values.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Management of XML and Semistructured Data
Web Programming Maymester 2004
Lecture 11 XML Wednesday, Oct. 24, 2001.
XML: Schemas, Queries Wednesday, 4/17/2002
Managing XML and Semistructured Data
New Perspectives on XML
Lecture 9: XML Monday, October 17, 2005.
DTD (Document Type Definition)
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 11: XML and Semistructured Data
Document Type Definition (DTD)
Presentation transcript:

Managing XML and Semistructured Data Lecture 11: Schema - DTD’s Prof. Dan Suciu Spring 2001

In this lecture Schema Extraction for SS data Schemas for XML DTDs XML Schema Resources Extracting Schema from Semistructured Data Nestorov, Abiteboul, Motwani. SIGMOD 98 Data on the Web Abiteboul, Buneman, Suciu : section 3.3

Review of Schemas so far Upper bound schema S Tell us what labels are allowed Conformance test: D  S In practice: need deterministic schemas Lower bound schema S Tells us what labels are required Conformance test: S  D Alternative formulation: datalog programs, maximal fixpoint

Schema Extraction (From Data) Problem statement given data instance D find the “most specific” schema S for D In practice: S too large, need to relax [Nestorov, Abiteboul, Motwani 1998]

Schema Extraction: Sample Data Example database D = &r employee employee employee employee employee employee employee employee manages manages manages manages manages &p1 &p2 &p3 &p4 &p5 &p6 &p7 &p8 managedby managedby managedby managedby managedby worksfor worksfor worksfor company worksfor worksfor worksfor worksfor worksfor &c

Lower Bound Schema Extraction [NAM’98] approach: Start with the schema given by the data (S = D): Each node = a predicate = a class Compute maximal fixpoint (PTIME) Declare two classes equal iff they are equal sets E.g. p4={&p1,&p4,&p6}, p6={&p1,&p4,&p6}, hence p1=p4 Equivalently, p=p’ iff p(&p’) and p’(&p) . . . p4(x) :- link(x, manages, y), p5(y), link(x, worksfor, z), c(z) p5(x) :- link(x, managed-by, y), p4(y), link(x, worksfor, z), c(z)

Lower Bound Schema Extraction Result S = Root &r employee company employee Bosses &p1,&p4,&p6 manages Regulars &p2,&p3,&p5,&p7,&p8 worksfor managedby Company &c worksfor

Lower Bound Schema Extraction Equivalently: Compute the maximal simulation D  D Can do in time O(m2) Two nodes p, p’ are equivalent iff x  x’ and x’  x Schema consists of equivalence classes Remark: could use the bisimulation relation instead (perhaps is even better)

Upper Bound Schema Extraction The extracted lower bound schema S is also an upper bound schema ! But: nondeterministic Convert S  Sd Alternatively, convert directly D  Dd = Sd These are data guides [McHugh and Widom]

Upper Bound Schema Extraction Result Sd = Root &r employee Employees &p1,&p1,&p3,P4 &p5,&p6,&p7,&p8 company managedby manages worksfor Bosses &p1,&p4,&p6 manages Regulars &p2,&p3,&p5,&p7,&p8 worksfor managedby Company &c worksfor

XML Document Type Definitions part of the original XML specification an XML document may have a DTD terminology for XML: well-formed: if tags are correctly closed valid: if it has a DTD and conforms to it validation is useful in data exchange

Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Very Simple DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> <product> ... </product> ... </company>

Content Model Element content: what we can put in an element (aka content model) Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA | A | B | C)* (i.e. very restrictied)

Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIS person age CDATA #REQUIRED> <person age=“25”> <name> ....</name> ... </person>

Attributes in DTDs <!ELEMENT person (ssn, name, office, phone?)> <!ATTLIS person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ... </person>

Attributes in DTDs Types: CDATA = string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday | Wednesday | Friday) = enumeration NMTOKEN = must be a valid XML name NMTOKENS = multiple valid XML names ENTITY = you don’t want to know this

Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed

Using DTDs Must include in the XML document Either include the entire DTD: <!DOCTYPE rootElement [ ....... ]> Or include a reference to it: <!DOCTYPE rootElement SYSTEM “http://www.mydtd.org”> Or mix the two... (e.g. to override the external definition)

DTDs as Grammars <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> <paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section> </paper>

DTDs as Grammars A DTD = a grammar A valid XML document = a parse tree for that grammar

like an upper bound schema DTDs as Schemas Not so well suited: impose unwanted constraints on order <!ELEMENT person (name,phone)> references cannot be constrained can be too vague: <!ELEMENT person ((name|phone|email)*)> like an upper bound schema