Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005 1 XML eXtensible Markup Language Part 2.

Similar presentations


Presentation on theme: "2005 1 XML eXtensible Markup Language Part 2."— Presentation transcript:

1 2005 http://www.cs.huji.ac.il/~dbi 1 XML eXtensible Markup Language Part 2

2 2005 http://www.cs.huji.ac.il/~dbi 2 XML Entities

3 2005 http://www.cs.huji.ac.il/~dbi 3 XML Entities should not be Confused with Entities in the Sense of the ER Model An entity is a short string that denotes more complex information, which may reside inside or outside the XML document or its DTD Entities save typing Entities facilitate easy changes (when the same change is likely to be made in many places) Sometimes entities must be used to circumvent XML syntax violations Applications should decode and encode entities, using their definitions

4 2005 http://www.cs.huji.ac.il/~dbi 4 General entities A general entity is defined in the DTD And it is used in the document by writing &Name;

5 2005 http://www.cs.huji.ac.il/~dbi 5 Example <!DOCTYPE mdb [ ]> Oh God! Woody Allen $2M

6 2005 http://www.cs.huji.ac.il/~dbi 6 Browser View

7 2005 http://www.cs.huji.ac.il/~dbi 7 Parameter Entities Parameter entities are used only within DTDs Internal entities are references within the DTD External entities are references that draw information from outside files Parameter Entity declaration:

8 2005 http://www.cs.huji.ac.il/~dbi 8 An Example of a Parameter Entity <!ATTLIST person friend (yes | no) #IMPLIED id ID #REQUIRED knows IDREFS #IMPLIED>

9 2005 http://www.cs.huji.ac.il/~dbi 9 Unparsed Entities <!DOCTYPE mdb [ <!ATTLIST movie id ID #REQUIRED opinion CDATA #IMPLIED starimage ENTITY #IMPLIED> ]> Entities are defined Types are defined

10 2005 http://www.cs.huji.ac.il/~dbi 10 Data Oh God! Woody Allen $2M

11 2005 http://www.cs.huji.ac.il/~dbi 11 Defining Entities Entities can be defined –in the local document as part of the DOCTYPE definition –with a link to external files that contain the entity data (this, too, is done through the DOCTYPE definition) –in an external DTD Define locally when the entity is being used only in one particular document Define by a link to an external file when the entity is being used in many documents

12 2005 http://www.cs.huji.ac.il/~dbi 12 Defining Entities – An Example Local Definition: <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> ]> Global Definition: <!DOCTYPE [ <!ENTITY copyright SYSTEM "http://www.worldspins.com/legal/copyright.xml" > ]>

13 2005 http://www.cs.huji.ac.il/~dbi 13 Another Example <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact legal@worldspins.com."> ]>

14 2005 http://www.cs.huji.ac.il/~dbi 14 Example (cont’d) Mini-globe revolutionizes keychain industry Today As The World Spins introduces a new approach to key chains. With the new MINI-GLOBE keys can be kept inside a chain, called for upon demand, and stored safely. Never more will consumers lose a key or stand at a door flipping through a stack of keys seeking the right one. &trademark;&copyright;

15 2005 http://www.cs.huji.ac.il/~dbi 15 XML Namespaces

16 2005 http://www.cs.huji.ac.il/~dbi 16 XML Namespaces When an element name appears in two different XML documents, we would like to know that it has the same meaning in both documents –Is the tag used as the XHTML tag in both documents? –If two documents about books have the tag, does it mean that they use the same system for cataloging books?

17 2005 http://www.cs.huji.ac.il/~dbi 17 What XML Namespaces are and What They are not Namespaces merely provide a mechanism for creating unique names (for elements and attributes) that can be used in XML documents all over the Web –A namespace is just a collection of names that were created for a specific domain of applications Namespaces are not DTDs and they do not provide a mechanism for validation of XML documents using multiple DTDs

18 2005 http://www.cs.huji.ac.il/~dbi 18 Identifying an XML Namespace A name space is identified by a URI The URI does not have to point to anything –It is merely used as a mechanism for creating unique names An element or attribute name from a namespace has two parts prefix:name prefix identifies the namespace name is just a name from the namespace

19 2005 http://www.cs.huji.ac.il/~dbi 19 Namespaces are not Part of the XML 1.0 Recommendation When an XML 1.0 parser sees a qualified name prefix:name the parser treats this name just as it would treat any other attribute or element name (it is legal to use the character “:” in element and attribute names) Namespaces must be hardwired into DTDs

20 2005 http://www.cs.huji.ac.il/~dbi 20 But When an application sees a qualified name, it may recognize it and act accordingly –A browser identifies tags that belong to the XHTML namespace and processes them –An XSLT processor identifies tags and attributes that belong to the XSLT namespace and executes them

21 2005 http://www.cs.huji.ac.il/~dbi 21 The W3C Recommendation for Namespaces in XML The two-part naming system is the only thing defined in the W3C Namespace recommendationW3C Namespace recommendation –and even that is not so short! This recommendation is just a collection of syntactic rules –Some rules are rather subtle

22 2005 http://www.cs.huji.ac.il/~dbi 22 Declaring a Namespace An XML namespace is declared in the xmlns attribute XML Namespaces John Doe Using foo as the prefix, instead of using the URI, is more convenient

23 2005 http://www.cs.huji.ac.il/~dbi 23 The Default Namespace The default namespace is declared without a prefix XML Namespaces John Doe All the elements belong to the default namespace

24 2005 http://www.cs.huji.ac.il/~dbi 24 Technically The namespace mechanism is just a mapping from prefixes to URIs, e.g., – is replaced with It is done in a processing layer that operates on the element tree resulting from XML 1.0 parsing It creates unique names

25 2005 http://www.cs.huji.ac.il/~dbi 25 DTDs as Namespaces The URI of a namespace may point to a DTD A DTD defines a namespace comprising all its element names and attribute names –But it is just a namespace – not a DTD!

26 2005 http://www.cs.huji.ac.il/~dbi 26 Example xmlns:bib=“http://www.acm.org/bibliography.dtd” xmlns:isbn=“http://www.isbn-org.org/def.dtd”> Proceedings of SIGMOD 472010 1-58113-332-4 This document is invalid according to either DTD! But the document is well formed! (e.g., in the book element, attribute names are unique)

27 2005 http://www.cs.huji.ac.il/~dbi 27 Alternatively, One Namespace can be Declared as the Default xmlns=“http://www.acm.org/bibliography.dtd” xmlns:isbn=“http://www.isbn-org.org/def.dtd”> Proceedings of SIGMOD 472010 1-58113-332-4 This document is well formed but invalid according to either DTD!

28 2005 http://www.cs.huji.ac.il/~dbi 28 Scope of Namespaces The scope of a namespace declaration is the element containing the declaration and all descendant elements –Must use the prefix anywhere in the scope Only the default namespace can be redeclared More than one namespace can be declared in the same scope –At most one can be the default namespace –All others must have unique prefixes

29 2005 http://www.cs.huji.ac.il/~dbi 29 What about Attributes? Recall that element names and attribute names must be qualified if they belong to a nondefault namespace Unqualified element names belong to the default namespace (if they are inside the scope) However, an unqualified attribute does not belong to the default namespace An unqualified attribute is processed according to the rules that apply to its element name

30 2005 http://www.cs.huji.ac.il/~dbi 30 Namespaces and DTDs: The Problem DTD syntax does not support namespaces The previous example showed an XML document with two DTDs that were used as namespaces –It is impossible to declare constraints that specify where fragments from each namespace can occur

31 2005 http://www.cs.huji.ac.il/~dbi 31 Namespaces and DTDs: The Solutions Use a namespace-aware schema language, or Modify one of the two DTDs so that it will be a DTD for the new document –Two alternatives, as illustrated on the next two slides, using the previous example

32 2005 http://www.cs.huji.ac.il/~dbi 32 One Alternative Add the required new elements to the DTD Give the appropriate unique names to these elements using parameter entities

33 2005 http://www.cs.huji.ac.il/~dbi 33 The Second Alternative Add the required new elements to the DTD, using qualified names Use the attribute-list declaration for the new elements to declare the namespace as a fixed value

34 2005 http://www.cs.huji.ac.il/~dbi 34 Data Exchange and Data Representation in XML

35 2005 http://www.cs.huji.ac.il/~dbi 35 Exchanging Relational Data Each tuple can be wrapped inside an element See example on the following slides

36 2005 http://www.cs.huji.ac.il/~dbi 36 Two Ways of Wrapping Relations in XML Documents projects: title budget managedBy employees: name ssn age

37 2005 http://www.cs.huji.ac.il/~dbi 37 The Project and Employee Relations in XML Pattern recognition 10000 Joe Joe 344556 34 Sandra 2234 35 Auto guided vehicle 70000 Sandra : Projects and employees are intermixed

38 2005 http://www.cs.huji.ac.il/~dbi 38 Pattern recognition 10000 Joe Auto guided vehicles 70000 Sandra : Joe 344556 34 Sandra 2234 35 : Employees follow projects Projects Employees

39 2005 http://www.cs.huji.ac.il/~dbi 39 Pattern recognition 10000 Joe Auto guided vehicles 70000 Sandra : Joe 344556 34 Sandra 2234 35 : Or without “separator” tags … Can be done if it is clear where each employee and each project starts

40 2005 http://www.cs.huji.ac.il/~dbi 40 DTDs for the First Two Documents <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>

41 2005 http://www.cs.huji.ac.il/~dbi 41 Wrapping Relations is not a Good Design Strategy When designing XML documents from ER diagrams, –ER entities are described by XML elements –ER attributes can be described either by XML attributes or by subelements –How to represent ER relationships? By using the built-in relationship in XML between elements and subelements But it is not always possible, so ID references might have to be used

42 2005 http://www.cs.huji.ac.il/~dbi 42 How to use XML Attributes XML attributes describe properties of the contents, rather than the contents cheese fromage branza A food made …

43 2005 http://www.cs.huji.ac.il/~dbi 43 Attributes (cont’d) Another common use for attributes is to express dimensions or types 2400 96 M05-.+C$@02!G96YE<FEC...

44 2005 http://www.cs.huji.ac.il/~dbi 44 Jeff Cohen 04-828-1345 054-470-778 jeffc@cs.technion.ac.il Irma Levy 03-426-1142 irmal@yourmail.com Using Attributes

45 2005 http://www.cs.huji.ac.il/~dbi 45 It is not Always Clear When to Use Attributes L. Simpson lisa@cs.huji.ac.il... 123 4589 L. Simpson lisa@cs.huji.ac.il...

46 2005 http://www.cs.huji.ac.il/~dbi 46 Using IDs Jeff Cohen 04-828-1345 054-470-778 jeffc@cs.technion.ac.il Irma Levy 03-426-1142 irmal@yourmail.com ID attributes

47 2005 http://www.cs.huji.ac.il/~dbi 47 How to Represent Relationships Two related ER entities, e.g., employees and departments, can be represented as follows A department is an element, and the employees are subelements of the department The relationship must be many-to-one or one-to-one –Subelements are the “many”

48 2005 http://www.cs.huji.ac.il/~dbi 48 No Multiple Copies of the Same Element (to Avoid Redundancies) Cannot represent in this way –A many-to-many relationship –A relationship with more than two entities –A binary relationship between an entity and itself or between two entities that are related by an ISA relationship ID references must be used in the above cases

49 2005 http://www.cs.huji.ac.il/~dbi 49 More Problematic Cases If there are several many-to-one relationships between two ER entities, then only one can be represented as an element-subelement relationship For example, employees can be subelements of their department But the relationship between a department and its manager (who is one of the employees) must be represented by an IDREF

50 2005 http://www.cs.huji.ac.il/~dbi 50 Missing Information is another Problem If there could be an employee without a department, then employees cannot be represented as subelements of departments –IDREFS have to be used

51 2005 http://www.cs.huji.ac.il/~dbi 51 Inverse Relationships XML does not have built-in inverse relationships Must use IDREF to represent inverse relationships For example, add an IDREF attribute to each employee element for denoting the department of the employee

52 2005 http://www.cs.huji.ac.il/~dbi 52 XML Schemas W3Schools on XML Schemas

53 2005 http://www.cs.huji.ac.il/~dbi 53 XML Schemas W3C XML Schema Language, also known as the language for XML Schema Definition (XSD) There are other proposals for XML Schemas

54 2005 http://www.cs.huji.ac.il/~dbi 54 XSDs have Types XSDs use complex types that generalize the content model of DTDs (i.e., the regular expressions for describing elements) Many simple types, e.g., String, Integer –Generalize PCDATA and CDATA Many facets of simple types, e.g., length, maxInclusive, maxExclusive

55 2005 http://www.cs.huji.ac.il/~dbi 55 xs:sequence and xs:all Can specify that subelements should appear in a specific order (i.e., sequence) or in any order (i.e., all) –But xs:all is not as general as xs:sequence Can restrict the number of occurrences of subelements, e.g., a departments can have between 10 and 100 employees

56 2005 http://www.cs.huji.ac.il/~dbi 56 References References are to specific elements or attributes, e.g., a reference to “person”, where “person” is the name of an element

57 2005 http://www.cs.huji.ac.il/~dbi 57 More Features Mixed content can be defined more generally, compared to DTDs Local and global definitions of elements and types Derived types by restriction or extension

58 2005 http://www.cs.huji.ac.il/~dbi 58 XSDs and Namespaces XSDs recognize namespaces Easier (than with DTDs) to check validity of a document with respect to multiple schemas –A very important feature when collecting information from multiple heterogeneous sources –XSDs are more extensible than DTDs

59 2005 http://www.cs.huji.ac.il/~dbi 59 Summary of XML XML is a new data format andits main virtues: –widespread acceptance –the (important) ability to handle semistructured data (data without schema) DTDs provide some useful syntactic constraints on documents, but as schemas they are weak How to store large XML documents? How to query them? How to map between XML and other representations?


Download ppt "2005 1 XML eXtensible Markup Language Part 2."

Similar presentations


Ads by Google