Semistructured-Data Model. Lu Chaojun, SJTU 2 Semistructured Data Structured data has a separate schema to describe its structure. –Advantage: efficient.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

An Introduction to XML Based on the W3C XML Recommendations.
1 XML DTD & XML Schema Monica Farrow G30
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Semistructured-Data Model Sept. 2014Yangjun Chen ACS Semistructured-Data Model Semistructured data XML Document type definitions XML schema.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
1 XML Document Type Definitions XML Schema. 2 Well-Formed and Valid XML uWell-Formed XML allows you to invent your own tags. uValid XML conforms to a.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Fall 2001Arthur Keller – CS 18017–1 Schedule Nov. 27 (T) Semistructured Data, XML. u Read Sections Assignment 8 due. Nov. 29 (TH) The Real World,
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
XML Document Type Definitions XML Schema. Motivation for Semistructured data Serves as a model suitable for integration of databases Notations such as.
4/20/2017.
XML – Data Model, DTD and Schema
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Database Systems Part VII: XML
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 1 Lecturer.
Chapter 10: XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
Document Type Definitions XML Schema
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 27 XML: Extensible Markup Language.
An OO schema language for XML SOX W3C Note 30 July 1999.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
An Introduction to XML Sandeep Bhattaram
Semistructured Data Extensible Markup Language Document Type Definitions Zaki Malik November 04, 2008.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
PART 1 XML Basics. Slide 2 Why XML Here? You need to understand the basics of XML to do much with Android All of they layout and configuration files are.
CITA 330 Section 4 XML Schema. XML Schema (XSD) An alternative industry standard for defining XML dialects More expressive than DTD Using XML syntax Promoting.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
Semistructured-Data Model
Semistructured-Data Model
XML QUESTIONS AND ANSWERS
Semi-Structured data (XML Data MODEL)
CSE591: Data Mining by H. Liu
Semi-Structured data (XML)
Document Type Definition (DTD)
Presentation transcript:

Semistructured-Data Model

Lu Chaojun, SJTU 2 Semistructured Data Structured data has a separate schema to describe its structure. –Advantage: efficient implementation of storage organization and query processing. Semi-structured data is self-describing, i.e., the data itself carries information about what its schema is. –Advantage: flexibility in adding new attributes and relationships. That is, schema can vary arbitrarily, both over time and within a single database.

Lu Chaojun, SJTU 3 Semistructured-Data Model Provides flexible conceptual tools to describe the real world. It is a kind of data model that –is suitable for integration of heterogeneous databases, and –serves as the underlying model for XML that are being used to share of information on the Web.

Lu Chaojun, SJTU 4 Graph Representation A database of semistructured data is a collection of nodes. Nodes = objects. –Leaf nodes have associated data of atomic types. –Interior nodes have arcs out. Root node has no arcs entering and represents the entire database. Label on arc: indicates how the target node relates to the source node. –No restriction on labels: representing attributes or relationships.

Lu Chaojun, SJTU 5 The Graph Nodes are connected in a rooted graph structure. sno 007 j.bond takes name cno CS123

Lu Chaojun, SJTU 6 Example M’lob 1995Gold BudA.B. prize award year name manf beer bar Joe’sMaple nameaddr servedAt name Root

Lu Chaojun, SJTU 7 Application: Info. Integration Problem: related data exists in many places, and needs be accessible as if they were one DB. –Integration of heterogeneous DB’s. e.g., company merge –The DB’s differ in data models and schemas, even if they talk about the same thing. Create a new DB to solve the problem? –Cost

Lu Chaojun, SJTU 8 Legacy Databases Legacy-database problem: once a DB has been in existence for a while, it becomes impossible to disentangle it from the applications that grow up around it, so the DB can never be decommissioned. –Even if we could efficiently transform the data from one schema to another, we shouldn’t do so.

A Possible Solution Lu Chaojun, SJTU 9 Legacy DB Interface Other Applications User Integrating two legacy databases through an interface that supports semistructured data. Query

Mediation Lu Chaojun, SJTU 10

XML Extensible Markup Language –Designed originally for marking documents. –But here treated as a data model. HTML vs. XML –HTML uses tags for presentation (formatting) (e.g., “italic”). –XML uses tags for semantics (e.g., “this is an address”). XML captures, in a linear form, the same structure as do the graphs of semistructured data. –Tags play the same role as do the labels on the arcs of a semistructured-data graph. Lu Chaojun, SJTU 11

Lu Chaojun, SJTU 12 Semantic Tags Tags: –In pairs: is balanced by, There can be text between them: Any text here. Abbreviation means no text in between. Element: a pair of matching tags and everything that comes between them. –Tags may be nested, as in … … … –XML is case-sensitive

Lu Chaojun, SJTU 13 XML vs Semistructured Data T-node S-node S Only allows tree structure?

XML Used in Two Modes Well-formed XML –No predefined schema Documents are free to use whatever tags you wish. –Corresponds closely to semistructured data. Valid XML –Conforms to a DTD (Document Type Definition) that specifies the allowable tags and gives a grammar for how they may be nested. –This form is intermediate between the strict-schema models and the completely schemaless model of semistructured data. Lu Chaojun, SJTU 14

Well-Formed XML Minimal requirements: 1.The document begins with a declaration that it is XML: 2.It has a root element that is the entire body of the document. Outer structure looks like:... standalone=“yes” means that there is no DTD. Lu Chaojun, SJTU 15

Example 007 James Bond CS123 CS Stephen Chow CS123 Lu Chaojun, SJTU 16

Attributes Attributes are intended for extra information associated with an element used only by programs that read and write the file, and not for the content of the element that’s read and written by humans. Attributes (name-value pairs) appear within the opening tag. –Alternative way to represent leaf nodes or labelled arcs of semistructured data. Lu Chaojun, SJTU 17

Example James Bond CS123 CS456 –Note: SNO here is no longer part of the content of the document, but part of the markup. Lu Chaojun, SJTU 18

Attributes that Connect Elements Represent connections in a semistructured data graph that do not form a tree. –Element ID’s vs. references Example James Bond Database Systems Lu Chaojun, SJTU 19 Attribute of type ID Attribute of type IDREF

Namespaces To associate a URI with a tag set, and attach a prefix to element/attribute, in order to: –Disambiguate mixed use of multiple markup vocabulary. –Avoiding name conflicts. Definition of a namespace: –myns is meaningful only in this element. Lu Chaojun, SJTU 20

Example: Namespace In general: <sjtu:Students xmlns:sjtu= “ sjtu:SNO=“007”> … Default namespace: <Students xmlns= “ SNO=“007”> … Lu Chaojun, SJTU 21

XML and DB XML is originally for document processing, not data processing. XML is often used for exchange/sharing of information over the Internet. –Publishing and shredding: DB1  XML  DB2 XML can also be used to store large amount of data with strict schema. –Stored in specialized XML DBMS? –Stored in RDB? Lu Chaojun, SJTU 22

Storing XML in RDB Method I: Documents(docID, strXML) Method II: DocRoot(docID, rootElementID) SubElement(parentID, childID, position) ElementAttribute(elementID, name, value) ElementValue(elementID, value) Method III: –SQL:2003 provides XML type. Lu Chaojun, SJTU 23

Document Type Definitions Grammar-like set of rules describing –what tags can appear in documents –how tags can be nested Intention is that DTD’s will be standards for a domain, used by everyone preparing or using data in that domain. –Establishing a shared view of the semantics of their elements. –Example: a DTD for describing protein structure, etc. Lu Chaojun, SJTU 24

Lu Chaojun, SJTU 25 Gross Structure of a DTD <!DOCTYPE root-tag [ more elements ]> root-tag is used (with its matching ender) to surround a document that conforms to the rules of this DTD.

DTD Elements An element is described by its name (tag) and a parenthesized list of components (nested elements) within it. –Including order of subelements and their multiplicity. –Leaves (text elements) have (#PCDATA) as components. –Special case: EMPTY indicate that the element has no subelements. Lu Chaojun, SJTU 26

Example <!DOCTYPE STUDENTS [ ]> Lu Chaojun, SJTU 27

Components The components of an element are the subelements that appear nested within, in the order specified. Multiplicity of a subelement: a) * = zero or more. b) + = one or more. c) ? = zero or one. In addition, | = “or”. –e.g. (#PCDATA | (STREET CITY)) Lu Chaojun, SJTU 28

Example: Element Description A name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address: <!ELEMENT NAME ( (TITLE?, FIRST, LAST) | IPADDR )> 29

Using a DTD 1. Set standalone = "no". 2. Either a) Include the DTD as a preamble to the document, or b) Follow the xml tag by a DOCTYPE declaration with the root tag, the keyword SYSTEM, and a file where the DTD can be found. Lu Chaojun, SJTU 30

Example of (a) <!DOCTYPE STUDENTS [ ]> 007 James Bond CS123 CS Stephen Chow Lu Chaojun, SJTU 31

Example of (b) Suppose the DTD is in file stud.dtd: 007 James Bond CS123 CS Stephen Chow Lu Chaojun, SJTU 32

Attributes Declaration in DTD In a DTD, declares attribute A for element E, along with its datatype T and default value V. –Common types: CDATA, enumerations, ID, IDREF, IDREFS, … –Default value may be “def_value”, #REQUIRED, #IMPLIED, or #FIXED “fixed_value”. –Several attributes can be declared in one ATTLIST statement, but this may not be a good style. Lu Chaojun, SJTU 33

Lu Chaojun, SJTU 34 Example Example of use: <STUDENT SNO = “007” NAME = “James Bond” DEPT = “CS” /> <STUDENT SNO = “008” NAME = “Stephen Chow” AGE = “47” DEPT = “EE” />

ID and IDREF These support pointers from one object to another –Allows the structure of an XML document to be a general graph, rather than just a tree. An attribute of type ID can be used to give the element a unique identifier. An attribute of type IDREF refers to some element by its ID. –Type IDREFS allow an attribute to contain multiple references. Lu Chaojun, SJTU 35

Example: DTD <!DOCTYPE UNIVERSITY [ ]> Lu Chaojun, SJTU 36

Example: A Document <STUDENT SNO = “007” TAKES = “CS123 CS456”> James Bond Stephen Chow DB OS Lu Chaojun, SJTU 37

XML Schema A more powerful way to describe the schema of XML documents. XML Schema declarations are themselves XML documents. –They describe “elements” and the things doing the describing are also “elements.” 38

Form of an XML Schema <xs:schema xmlns:xs = ” 39 Defines ”xs” to be the namespace described in the URL shown. So uses of ”xs” within the schema element refer to tags from this namespace.

Element Definition Use xs:element element. Has attributes: 1.name = the tag-name of the element being defined. 2.type = the type of the element being defined. uCould be an XML-Schema type, e.g., xs:string. uOr the name of a type defined in the document itself. 40

Example <xs:element name = ”NAME” type = ”xs:string” /> Describes elements such as James Bond 41

Complex Types To describe elements that consist of subelements, we use xs:complexType. –Attribute name gives a name to the type. Typical subelement of a complex type is xs:sequence, which itself has a sequence of xs:element subelements. –Use minOccurs and maxOccurs attributes to control the number of occurrences of an xs:element. 42

Example: Element Type Def <xs:element name = ”SNO” type = ”xs:string” minOccurs = ”1” maxOccurs = ”1” /> <xs:element name = ”NAME” type = ”xs:string” minOccurs = ”1” maxOccurs = "unbounded”/> <xs:element name = ”AGE” type = ”xs:integer” minOccurs = ”0” maxOccurs = ”1” /> 43

Example: Elements of the Type 007 James Bond 008 Stephen Chow Zhou Xingxing Unknown from previous slide

Attribute Definition xs:attribute elements can be used within a complex type to indicate attributes of elements of that type. Attributes of xs:attribute : –name and type as for xs:element. –default = default value. –use = ”required” or ”optional”. 45

Example <xs:attribute name = ”SNO” type = ”xs:string” use = ”required” /> <xs:attribute name = ”NAME” type = ”xs:string” use = ”optional” /> <xs:attribute name = ”AGE” type = ”xs:integer” default = “18” /> 46

An Element of studentType <xxx SNO = ”007” NAME = ”James Bond” /> 47 We still don’t know the element name. The element is empty, since there are no declared subelements.

Restricted Simple Types xs:simpleType can describe enumerations and range-restricted base types. –name is an attribute indicating type name. xs:restriction is a subelement. –Attribute base gives the simple type to be restricted, e.g., xs:integer. 48

Restrictions xs:{min|max}{Inclusive|Exclusive} are four elements that, with attribute value, can give lower or upper bounds on a numerical range. xs:enumeration is a subelement with attribute value that allows enumerated types. 49

Example 50

Example: Age Range [1,180) 51

Keys in XML Schema An xs:element can have an xs:key subelement. Meaning: within this element, all subelements reached by a certain selector path will have unique values for a certain combination of fields. Example: within one BAR element, the name attribute of a BEER element is unique. 52

Example: Key XPath is a query language for XML. A path is a sequence of tags separated by /.

Foreign Keys An xs:keyref subelement within an xs:element says that within this element, certain values (defined by selector and field(s), as for keys) must appear as values of a certain key. 54

Example Suppose that we have declared that subelement CNO of COURSE is a key. –The name of the key is cKey. We wish to declare STUDENT elements that have TAKES subelements. An attribute cno of TAKES is a foreign key, referring to the CNO of a COURSE. 55

Example (cont.)... <xs:keyref name = ”cRef” refers = ”cKey”... 56

End