Download presentation
Presentation is loading. Please wait.
1
Web Data Management XML and its Syntax
2
Why XML is of Interest to Us
XML is just syntax for data Note: we have no syntax for relational data But XML is not relational: semistructured This is exciting because: Can translate any legacy data to XML Can ship XML over the Web (HTTP) Can input XML into any application Thus: data sharing and exchange on the Web
3
XML Data Sharing and Exchange
application application object-relational Integrate XML Data WEB (HTTP) Enterprise systems: 3-tier, strongly typed Distributed Object Technology (DCOM, CORBA) Web applications: multi-tier, loosely typed: Simplicity wins : exploit XML’s common illusion Access data from Multiple sources, On varied platforms, Across many enterprises Demand standards Portals demanding query support from providers Transform Warehouse application relational data legacy data Specific data management tasks
4
From HTML to XML HTML describes the presentation
5
HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
6
XML XML describes the content
<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content
7
Relevance of XML Databases are basically metadata about actual data contained in other tables Actually describing & understanding the data poses a problem XML binds a piece of data to what it is supposed to accomplish Information is human & machine readable XML is easier to understand & implement Point 1: Although the idea of using "metadata" to describe data only recently came to the fore, the concept has been around for a long time. Databases, for instance, are made up of tables, columns, views, and such, which are nothing more than metadata about the actual data contained in other tables. These items help describe the data without knowing what is contained in it. Point 2: However, simply using names, perhaps combined with data types, does not solve all the problems present when trying to describe and ultimately understand data. Point 3: XML becomes the glue that binds what a piece of data actually is to what it is supposed to accomplish. It's used to describe all aspects of the data, ranging from near-physical properties to usage instructions, and its relationship to other data. Point 4: This information can be used for human and machine-readable purposes, one of the true advantages of XML Point 5: SGML and EDI were expensive, either because of the software needed to handle the data or the training necessary to teach these complex markup languages. Research helped create XML (a subset of SGML), which is able to describe data in a standardized way but is easier to understand and implement.
8
Relevance of XML An example describing directions from one location to another <?xml version="1.0"?> <map> <start> <addr1>100 West Morgan St</addr1> <city>Raleigh</city> <state>NC</state> <zip>27603</zip> </start> <directions> <left distance="0.11 miles">W MORGAN ST</left> <left distance="0.11 miles">S WILMINGTON ST</left> <left distance="0.44 miles">E EDENTON ST</left> <right distance="0.09 miles">N WEST ST</right> <left distance="0.02 miles">W JONES ST</left> </directions> <destination> <addr1>508 W Jones St</addr1> </destination> </map> Point 1: Another example is map.xml, which defines directions from one location to another. These examples demonstrate how you can use XML to describe data that can be understood by a human, machine, or software. We specifically used these nontechnical examples to show that XML can solve many of the difficulties of exchanging or communicating data between two parties, human or machine.
9
Working with Objects Object-Oriented approach allows greater flexibility XML has object like implementations and uses ‘Schema for Object-Oriented XML’ (SOX) SOX enforces a valid Uniform Resource Identifier (URI); must include the file:// portion in the <schema> element We've heard how great XML is, how it helps with certain issues, and seen a few examples, but what does this mean? What types of difficulties does it solve, how does it solve them, and why is XML the solution for these issues? These are questions that we need to ask ourselves before choosing XML as our solution-providing method. We need to be sure it is the right means to provide integration points with our applications, or descriptions and structure to our content. Let us talk about some specific uses of XML and how they work. Point 1: Those of you familiar with the principles and practices of object-oriented programming will have added respect, and probably motivation, for creating items in an object-oriented manner. Doing so allows for a degree of flexibility and potential component reuse that is not possible when operating in a non-object environment. Point 2: XML is no different and has its own object-like implementations and uses. Point 3: One such limitation of XML was the inability to present data in an object-oriented manner. To help fill this void, a Note at the W3C was published titled "Schema for Object-Oriented XML (SOX)". Point 4: Because SOX enforces a valid Uniform Resource Identifier (URI), we must include the file:// portion of the URL in the <schema> element. Let us consider an example to understand this better..
10
Working with Objects <?xml version = "1.0" encoding = "UTF-8"?> <!DOCTYPE schema SYSTEM "urn:x-commerceone:document:com:commerceone:xdk:xml:schema.dtd$1.0"> <schema uri = "file:///S:/home_computer.sox" soxlang-version = "V0.2.2"> <elementtype name = "home_computer"> <model> <sequence> <element type = "monitor"/> <element type = "housing"/> <element type = "speakers"/> <element type = "keyboard"/> <element type = "mouse"/> </sequence> </model> </elementtype> <elementtype name = "monitor"> <string/> Using XML, we create a home computer object that contains a monitor, housing (CPU, RAM, hard disk, and so forth), speakers, a keyboard, and a mouse. The definition of this schema is partly shown (due to restricted space) on the slide.. along with a visual representation of this schema on the right. This data model is fine if we only define home computers, but what happens if we want to define a work computer? Using SOX we can build such an object by extending our base <home_computer> element as shown on the following slide..
11
Working with Objects <?xml version = "1.0" encoding = "UTF-8"?> <!DOCTYPE schema SYSTEM "urn:x-commerceone:document:com:commerceone:xdk:xml:schema.dtd$1.0"> <schema uri = "file:///S:/home_computer.sox" soxlang-version = "V0.2.2"> <join system = "file:///S:/home_computer.sox"/> <elementtype name = "work_computer"> <extends type = "home_computer"> <append> <sequence> <element type = "scanner"/> <element type = "zip_drive"/> <element type = "printer"/> </sequence> </append> </extends> </elementtype> <elementtype name = "scanner"> <model> <string/> </model> <elementtype name = "zip_drive"> <elementtype name = "printer"> </schema> In our home_computer.sox schema, we define a new element called <work_computer>, and say it extends <home_computer>. Now, we are able to add items like a printer, scanner, and zip drive. This new schema, work_computer.sox, now inherits all the elements in the original schema and requires us to only define the additions.
12
Working with Objects work_computer.sox schema home_computer.sox schema
home_computer.sox schema compared to work_computer.sox schema
13
Working with Objects <?xml version = "1.0" encoding = "UTF-8"?>
SOX-compliant parsers pull in the required parent schemas <?xml version = "1.0" encoding = "UTF-8"?> <?soxtype file:///S:/work_computer.sox?> <work_computer> <monitor>15 inch</monitor> <housing> <cpu>1 GHz</cpu> <ram>256 MB</ram> <disk_space>60 GB</disk_space> <modem>56k</modem> </housing> <speakers>JBL</speakers> <keyboard>Microsoft</keyboard> <mouse>Microsoft</mouse> <scanner>Microtek</scanner> <zip_drive>100 MB</zip_drive> <printer>HP</printer> </work_computer> Now that the work computer schema is defined, we can create an instance document of this schema. Because SOX-compliant parsers will pull in the necessary parent schemas, we only need to reference the work_computer.sox schema as shown here.. Object modeling allows you to reuse components of XML schema dialects, like SOX or XSD, which can lead to increased productivity within your development environment by cutting down on redundant work.
14
Application Messaging
XML & SOAP enabled ‘application messaging’ XML used to describe the API to a Web Service Software applications should be able to communicate in a near automated fashion XML describes the Web Services themselves Applications can query other applications to judge capability before making a request Point 1: The term "messaging" is often thought of as messaging between friends and family, such as with MSN Messenger, and not between full blown enterprise-level applications, which could in turn connect your data repositories to those of your partners and affiliates. XML changed all that, and with the help of standards like SOAP, application messaging is now a reality. Point 2: XML is used to describe the Application Programming Interface (API) to a Web Service, which allows applications to partially automate the process by which they communicate. The requesting application can understand the capabilities of the responding server and determine if it is compatible with its own data structure and type. Point 3: The purpose of application messaging is for software applications, potentially in geographically different locations, to communicate in a near-automated fashion. Communicating usually involves receiving a request, processing that request, and a returning a result to the requesting application. XML emerges as the language of choice for describing not only the request, but also the response. This description might contain the path or route of the request, the type of information requested, and the actual data returned (marked up in XML, of course). Point 4: With the shift to Web Services as the next generation of Application Service Providers, XML is used not only for application-to-application messaging, but also for describing the services themselves. This description could contain anything from the type of data that can be requested to outlining the result options. Point 5: Following this model, applications can query other applications to understand their capabilities before making a request.
15
Application Messaging
NFQuery.dtd <?xml version='1.0' encoding='UTF-8' ?> <!ELEMENT query (news)> <!ELEMENT news EMPTY> <!ATTLIST news date CDATA #REQUIRED type (global | local | financial | sports | travel | weather ) #REQUIRED what (count | headers | all ) #REQUIRED limit CDATA #IMPLIED > Joe_query.xml <?xml version = "1.0" encoding = "UTF-8"?> <!DOCTYPE query SYSTEM "NFQuery.dtd"> <query> <news date = " " type = "sports" what = "count"/> </query> A Simple Example of Messaging News Feed, Inc. builds a system that returns the daily news to a requesting application. The system has everything from sports, financial news, and local news, to world, weather, and travel news. The application has the ability to be queried and pass a date to see if it has any relevant news for a given day. This query also contains the type of information you're looking for, what you want returned (count, headers, all data), and how many results are returned. The XML schema used to describe this query is shown in NFQuery Point 2: Joe's Sports Tips, a local newspaper that uses News Feed's data, only wants to retrieve sports news. Because of this, Joe's first sends a query to the News Feed system to see how many sports headlines it has for a given date. The query, Joe_query, requests the number of sport headlines for October 10, 2001.
16
Application Messaging
NFResponse.dtd <?xml version='1.0' encoding='UTF-8' ?> <!ELEMENT results (count | headers | all)> <!ATTLIST results date CDATA #REQUIRED type (global | local | financial | sports | travel | weather ) #REQUIRED > <!ELEMENT count (#PCDATA)> <!ELEMENT headers (#PCDATA)> <!ELEMENT all (headline+)> <!ELEMENT headline (#PCDATA)> Joe_query_response.xml <?xml version = "1.0" encoding = "UTF-8"?> <!DOCTYPE results SYSTEM "NFResponse.dtd"> <results date = " " type = "sports"> <count>53</count> </results> Point 1: The next part of the process is for News Feed to return a response to Joe's. The format of the response is governed by the NFResponse.dtd schema. The response, shows 53 items in sports news on October 10.
17
Application Messaging
Joe_get_query.xml <?xml version = "1.0" encoding = "UTF-8"?> <!DOCTYPE query SYSTEM "NFQuery.dtd"> <query> <news date=" " type="sports" what="all" limit="10"/> </query> Point 1: Now that Joe's knows how many items it can access, the system decides to request the first 10 articles. Again using the NFQuery.dtd data model, it changes the initial query to reflect that it wants 10 items and the entire article for these items.
18
Process Modeling XML used to model process & workflow <start>
<map> <start> <addr1>100 West Morgan St</addr1> <city>Raleigh</city> <state>NC</state> <zip>27603</zip> </start> <directions> <left distance="0.11 miles">W MORGAN ST</left> <left distance="0.11 miles">S WILMINGTON ST</left> <left distance="0.44 miles">E EDENTON ST</left> <right distance="0.09 miles">N WEST ST</right> <left distance="0.02 miles">W JONES ST</left> </directions> <destination> <addr1>508 W Jones St</addr1> </destination> </map> Point 1: A relatively new use of XML is to model processes and workflow. Using XML to model processes is not just a means by which you can describe how something works. Because it was implemented in XML, it inherits all the abilities of the XML language, such as validation. This provides a second level of verification when it comes to ensuring that processes are defined according to the rules and objectives set forth.
19
The .NET Framework Shift from individual Web sites or devices to an integrated cluster of devices & services People will control what, when, and how information is delivered to them HTML based presentation is augmented by XML-based information XML-based .NET programming model forges the idea of XML-based Web Services Web services use protocols like HTTP & XML Point 1: The fundamental idea behind the .NET Framework is that the focus is shifting from individual Web sites or devices connected to the Internet to constellations of computers, devices, and services that work together to deliver richer, broader solutions. Point 2: People will control what, when, and how information is delivered to them. Computers, devices, and services will collaborate with each other to provide rich services instead of being isolated islands where the user provides the only integration. Businesses will offer their products and services in a way that lets customers seamlessly embed them in their own electronic fabric. Point 3: The .NET Framework will help drive a transformation in the Internet that will see HTML-based presentation augmented by programmable XML-based information. Point 4: The loosely coupled XML-based .NET programming model introduces the concept of creating XML-based Web Services. Whereas today's Web sites are handcrafted and don't work with other sites without significant additional development, the .NET programming model provides a built-in mechanism to build any Web site or service so that it will coalesce and collaborate seamlessly with others. Point 5: Web Services do not use object model-specific protocols such as DCOM, RMI, or IIOP. Web Services take a different approach: they communicate using ubiquitous Web protocols and data formats such as HTTP and XML. Any system supporting these Web standards will be able to support Web Services.
20
XML within .NET XML obvious choice to represent commands and typed data XML standard metalanguage for describing data SOAP is an industry standard for using XML Service Contract Language (SCL); XML grammar for documenting Web Service contracts Businesses can create a variety of value-added applications by combining Web Services Point 1: XML is the obvious choice for defining a standard yet extensible language to represent commands and typed data, and it's only appropriate that Windows .NET has adopted its use widely within its framework. Point 2: While rules for representing commands and typed data using other techniques (such as encoding this information in a query string) could be defined, XML is specifically designed as a standard metalanguage for describing data Point 3: SOAP is an industry standard for using XML to represent data and commands in an extensible way. SOAP is being further developed at W3C under the name of XML Protocol (XMLP). Point 4: XML is the enabling technology for the Web Service contracts. The Service Contract Language (SCL) is an XML grammar for documenting Web Service contracts. Because SCL is XML-based, contracts are easy for both developers and developer tools to create and interpret. Point 5: Obviously, the advantages of the model are many. Companies can not only more easily integrate internal applications, but they can also access services offered by other businesses. By combining Web Services on the Internet, companies can create a wide variety of value-added applications.
21
Required Knowledge XML is simply metadata used to describe
a markup language Knowledge of XPath or SOAP is helpful You may use a text editor or some IDE You can even write your own parser for validating instance documents Refer to help resources on the web: Point 1: Few people learn XML as their first markup-based language; most have had previous experience with SGML or HTML/XHTML. Even if you have no prior experience, XML is an easy language to learn conceptually. It is simply metadata used to describe a markup language. But conceptual understanding is not the biggest problem in knowing and using XML. Point 2: Saying that you know XML states that you can understand the constructs of a given XML-based language, but does not necessarily mean that you understand the purpose and usage of that language. While learning XML you should take the time to learn XML-based languages created with it. For instance, you should explore other standards such as Namespaces in XML, XPath, or SOAP. Your usage of XML will surely go beyond just the details of the language itself and will involve many of the other XML-based implementations. Point 3: Using XML for personal or professional uses requires software. You may opt to use a text editor, like Notepad, in writing your schemas and instance documents instead of performing these tasks in some kind of Integrated Development Environment (IDE). Point 4: You may also, in accordance with the published standards, opt to write your own parser for validating instance documents. Although this is unlikely, the application you build will be larger if a prebuilt parser is included. This could be anything from a type of user agent to a server that processes XML documents for insertion into a database. Point 5: Several places on the Web offer you help, and several methods can help you get it. These resources alone gives developers a leg up on difficulties they might come across. These sites are particularly useful.. Finding help is not always easy, so if all else fails you can fall back on your favorite search engine and a few keywords to see if any additional sites cove your topic of interest.
22
Goals of XML The goals behind creating the XML language:
should be compatible with SGML should support a variety of applications should be easily usable over the internet xml design should be prepared quickly design of XML shall be formal and concise xml documents should be reasonably clear xml documents should be uncomplicated to create programs processing XML documents easy to write terseness in XML markup is of minimal importance optional features should be kept to a bare minimum Point 1: Before you can fully understand and appreciate what XML introduces into the application integration equations, we must take a moment to talk about the initial goals behind creating the language. The original XML Working Group at the W3C was formed out of the SGML Editorial Review Board, and it made 10 design goals while creating the XML language. XML shall be straightforwardly usable over the Internet. XML shall support a wide variety of applications. XML shall be compatible with SGML. It shall be easy to write programs which process XML documents. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. XML documents should be human-legible and reasonably clear. The XML design should be prepared quickly. The design of XML shall be formal and concise. XML documents shall be easy to create. Terseness in XML markup is of minimal importance. You should be exposed to these items because they lay the foundation for the design and objectives of the language.You should make sure your objectives are in line with these design goals.
23
The XML Language Elements: represent tags or language that you create with XML <!ELEMENT name type> A customer data model.. Point 1: Elements represent the tags, or language, that you create with XML. To define an element in a DTD you use the following syntax. <!ELEMENT name type> The name is the element name you want to define and the type is the type of content the element contains. This could be text, other elements, or a combination of the two. Point 2: Here's an example to give you a better idea of what this means. Suppose you want to create a customer schema that includes their name and contact information. To provide another level of granularity we break the name into first, last, and middle names, and contact into address and phone. Taking it one more step, we will break address into street, city, state, and zip, while we break phone into home, work, and mobile.
24
<!ELEMENT customer (name , contact)>
The XML Language We first define our customer element <!ELEMENT customer (name , contact)> You can impose some rules on this, such as having name OR contact Defining <name> and <contact> is similar as they are parents of child elements: <!ELEMENT name (first , middle , last)> <!ELEMENT contact (address , phone)> <!ELEMENT address (street , city , state , zip)> <!ELEMENT phone (home , work , mobile)> Point 1: Because declaration within XML DTDs must appear in a specific order, we first define our customer element. This element contains two child elements as its content, so the DTD representation of this XML element takes the following form: <!ELEMENT customer (name , contact)> Point 2: In this definition, for the document to be valid (more on what valid is later in the chapter), it must contain a single instance of the <name> element followed by a single instance of the <contact> element. You can impose some rules on this, such as having name OR contact or making one or both optional Point 3: Defining <name> and <contact> is similar because they are parent elements of child elements. So are <address> and <phone>. The definition for these can be represented as follows: <!ELEMENT name (first , middle , last)> <!ELEMENT contact (address , phone)> <!ELEMENT address (street , city , state , zip)> <!ELEMENT phone (home , work , mobile)>
25
XML Terminology tags: book, title, author, …
start tag: <book>, end tag: </book> elements: <book>…<book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element
26
XML Syntax Another example: < db > < book > < title
Complete Guide to DB2 </ title > < author > Chamberlin </ author > </ book > < book > < title > Transaction Processing </ title > < author > Bernstein </ author > < author > Newcomer </ author > </ book > < publisher > < name > Morgan Kaufman </ name > < state > CA </ state > </ publisher > </ db >
27
The XML Tree db book book publisher title author author name state
“Complete Guide to DB2” “Morgan Kaufman” “CA” “Chamberlin” “Transaction Processing” “Bernstein” “Newcomer” Tags on nodes Data values on leaves
28
XML Components An XML file normally consists of three types of markup, the first two of which are optional: 1. An XML processing instruction (PI) identifying the version of XML being used, the way in which it is encoded, and whether it references other files or not, e,g, <?xml version="1.0" encoding="UCS2" standalone="yes">
29
XML Components 2. A document type declaration (DTD)
either contains the formal markup declarations in its internal subset (between square brackets) or references a file containing the relevant markup declarations (the external subset), e.g.: <!DOCTYPE memo SYSTEM "
30
XML Components 3. A fully-tagged document instance which
consists of a root element, whose element type name must match that assigned as the document type name in the document type declaration, within which all other markup is nested.
31
XML Characteristics Validity Well-formed
If all three components are present, and the document instance conforms to the rules defined in the document type definition Well-formed if each element is properly nested within its parent elements, if it has matching tags if each attribute is specified as an attribute name followed by a value indicator (=) and a quoted string.
32
XML Components Six kinds of markup that can occur in an XML document: elements, entity references, comments, processing instructions, marked sections, and document type declarations. Document Type Declarations An XML document primarily consists of a strictly nested hierarchy of elements with a single root. Elements can contain character data, child elements, or a mixture of both. In addition, they can have attributes.
33
XML Components Child character data and child elements are strictly ordered; attributes are not. For example: <?xml version="1.0" ?> <Book Author="Anonymous"> <Title>Sample Book</Title> <Chapter id="1"> This is chapter 1. It is not very long or interesting. </Chapter> <Chapter id="2"> This is chapter 2. Although it is longer than chapter 1, it is not any more interesting. <comments/> </Book>
34
“Types” (or “Schemas”) for XML
Document Type Definition – DTD Define a grammar for the XML document we use it as substitute for types/schemas Will be replaced by XML-Schema
35
Document Type Definition (DTD)
The Document Type Definition(DTD) is either contained in a <!DOCTYPE> tag, contained in an external file and referenced from a <!DOCTYPE> tag, or both. PCDATA means Parsed Character Data (a mouthful for string) <!DOCTYPE Book [ <!ELEMENT Book (Title, Chapter+,comments?)> <!ATTLIST Book Author CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Chapter (#PCDATA)> <!ATTLIST Chapter id ID #REQUIRED> <!ELEMENT comments EMPTY> ]>
36
An Example DTD <!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (#PCDATA)> ]> PCDATA means Parsed Character Data (a mouthful for string)
37
DTDs as Grammars db ::= (book|publisher)*
book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree XML Documents that have a DTD and conform to it are called valid
38
DTD Vs XML Schema DTD: old style typing, still very used
XML schema: more modern, used e.g. in Web services DTD: <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>
39
DTD Vs XML Schema The same structure in XML schema (an XML dialect)
<?xml version="1.0"?> <xs:schema xmlns:xs=" <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string“ minOccurs=’1’ maxOccurs=’1’/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
40
Elements 1) An element is defined as a group of one or more subelements/subgroups, character data, EMPTY, or ANY. For example: Group: <!ELEMENT A (B, C)> Character data: <!ELEMENT A (#PCDATA)> Empty: <!ELEMENT A EMPTY> Any: <!ELEMENT A ANY>>
41
Elements 2) Elements defined as groups of subelements/ subgroups constitute non-terminals in the language. Elements defined as character data, EMPTY, or ANY constitute terminals. For example: It is legal to define a language containing non-terminals that never resolve to terminals, such as one with purely circular definitions It is generally impossible and/or useless to create any valid documents for such languages. <!-- Element A is a non-terminal. --> <!ELEMENT A (B)> <!-- Element B is a terminal. --> <!ELEMENT B (#PCDATA)>
42
Elements 3) Groups can be either a sequence or choice of subelements and/or subgroups. For example: Sequence: <!-- Element A consists of a single element B. --> <!ELEMENT A (B)> <!-- Element A consists of element B followed by element C. --> <!ELEMENT A (B, C)> <!-- Element A consists of a sequence, including a choice subgroup. --> <!ELEMENT A (B, (C | D), E)> Choice: <!-- Element A consists of either element B or element C. --> <!ELEMENT A (B | C)> <!-- Element A consists of a choice, including a sequence subgroup. --> <!ELEMENT A (B | C | (D, E))>
43
Elements 4) Optional (?), one-or-more (+), and zero-or-more (*) operators can be applied to groups, subgroups, and subelements. For example: Optional: <!-- Subelement B is optional. --> <!ELEMENT A (B?, C)> One or more: <!-- Subgroup (C | D) occurs one or more times. --> <!ELEMENT A (B, (C | D)+, E)> Zero or more: <!-- Group (B, C) occurs zero or more times, i.e. A can be empty. --> <!ELEMENT A (B, C)*>
44
Elements 5) Elements containing character data can be declared as containing only character data: The latter case is an example of “mixed content” "PCDATA" in the declarations is short for "Parsed Character DATA". <!ELEMENT A (#PCDATA)> or as containing a mixture of character data and elements: <!ELEMENT A (#PCDATA | B | C)*>
45
Elements 6) EMPTY means that the element has no child elements or character data. 7) ANY means that the element can contain zero or more child elements of any declared type, as well as character data. It is therefore a shorthand for mixed content containing all declared elements.
46
Attributes 1) Elements can have zero or more attributes. For example:
Attributes are name-value pairs that occur inside tags after the element name. In XML, all attribute values must be quoted. Attributes are alternative ways to represent data <!ELEMENT A (#PCDATA)> <!-- Declare an attribute a for element A --> <!ATTLIST A a CDATA #IMPLIED> <div class="preface">
47
Attributes <book price = “55” currency = “USD”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year> </book> price, currency are called attributes
48
Replacing Attributes with Elements
<book> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year> <price> 55 </price> <currency> USD </currency> </book> attributes are alternative ways to represent data
49
Attributes 2) A single ATTLIST statement can declare multiple attributes for the same element. Multiple ATTLIST statements can declare attributes for the same element. That is, the following are equivalent: Single ATTLIST statement declaring multiple attributes for an element: <!-- Element A has attributes a and b --> <!ATTLIST A a CDATA #IMPLIED b CDATA #IMPLIED> Multiple ATTLIST statements declaring attributes for the same element: <!ATTLIST A a CDATA #IMPLIED> <!ATTLIST A b CDATA #IMPLIED>
50
Attributes 3) Attributes can be optional, required, or have a fixed value. Optional attributes can have a default; fixed attributes must have a default. For example: Optional without a default: <!-- Element A has an attribute a. #IMPLIED = "optional, no default" --> <!ATTLIST A a CDATA #IMPLIED> Optional with a default: <!-- If attribute a is not provided, a default of "aaa" will be used. --> <!ATTLIST A a CDATA "aaa"> Required: <!ATTLIST A a CDATA #REQUIRED> Fixed: <!-- The value of attribute a is always "aaa" --> <!ATTLIST A a CDATA #FIXED "aaa">
51
Attributes 4) Each attribute has a type: Character data:
A user-defined enumerated type <!ATTLIST A a CDATA #IMPLIED> <!-- Attribute a uses a simple enumeration. --> <!ATTLIST A a (yes | no) #IMPLIED> <!-- Attribute a uses an enumeration of notation types.--> <!ATTLIST A a NOTATION (ps | pdf) #IMPLIED>
52
Attributes ID, IDREF: These attributes point from one element to another. The value of the IDREF attribute on the pointing element is the same as the value of the ID attribute on the pointed-to element. <!-- Attribute id gives the ID of element A --> <!ATTLIST A id ID #IMPLIED> <!-- Attribute ref points to the ID of another element --> <!ATTLIST A ref IDREF #IMPLIED>
53
Oids and References oids and references in XML are just syntax
<person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/> </person> <person id=“o123” mother=“o456”><name>John</name> oids and references in XML are just syntax
54
Attributes ENTITY, ENTITIES. These attributes point to external data in the form of unparsed entities. NMTOKEN, NMTOKENS. These attributes have single/multiple tokens as values. <!-- Attribute a points to a single unparsed entity --> <!ATTLIST A a ENTITY #IMPLIED> <!-- Attribute b points to multiple unparsed entities --> <!ATTLIST A b ENTITIES #IMPLIED> <!ATTLIST A a NMTOKEN #IMPLIED> <!ATTLIST A b NMTOKENS #IMPLIED>
55
Entity Declarations Entity declarations allow you to associate a name with some other fragment of the document. That construct can be a chunk of regular text, a chunk of the document type declaration, or a reference to an external file containing either text or binary data. <!ENTITY ATI "ArborText, Inc."> <!ENTITY boilerplate SYSTEM "/standard/legalnotice.xml"> <!ENTITY ATIlogo SYSTEM "/standard/logo.gif" NDATA GIF87A>
56
Entity Declarations There are three kinds of entities: Internal, external, and parametric. Internal Entities the replacement text is stored in the declaration. Using &ATI; anywhere in the document insert “ArborText, Inc.” at that location. character reference, can be used to insert arbitrary Unicode characters Character references take one of two forms: decimal references, ℞ , and hexadecimal references, ℞ . Both of these refer to character number U+211E from Unicode
57
Entity Declarations Internal entities can include references to other internal entities, but it is an error for them to be recursive. Example: <element> this is less than < </element> The XML specification predefines five internal entities: Declaration Reference Symbol <!ENTITY lt "<"> < <!ENTITY gt ">"> > <!ENTITY amp "&"> & & <!ENTITY apos "'"> ' ' <!ENTITY quot """> " " & Unicode char
58
Entity Declarations External Entities
Using &boilerplate; will insert the contents of the file /standard/legalnotice.xml The XML processor will parse the content of that file as if its content had been typed at the location of the entity reference. The entity ATIlogo is also an external entity, but its content is binary. The ATIlogo entity can only be used as the value of an ENTITY (or ENTITIES) attribute (on a graphic element, perhaps).
59
Entity Declarations Parameter Entities
Parameter entities can only occur in the document type declaration. A parameter entity is identified by placing “% ” (percent-space) in front of its name in the declaration. The percent sign is also used in references to parameter entities, instead of the ampersand. Parameter entity references are immediately expanded in the document type declaration and their replacement text is part of the declaration, whereas normal entity references are not expanded.
60
Notation Declarations
specific types of external binary data. This information is passed to the processing application, which may make whatever use of it that it wishes. A typical notation declaration is: Comments <!NOTATION GIF87A SYSTEM "GIF"> <!-- and end with -->
61
Processing Instructions
Processing instructions (PIs) are an escape hatch to provide information to an application. XML processor is required to pass them to an application. Syntax: <?target argument?> Example: <product> <name> Alarm Clock </name> <?ringBell 20?> <price> </price> </product> The names used in PIs may be declared as notations in order to formally identify them.
62
CDATA Sections In a document, a CDATA section instructs the parser to ignore most markup characters. Consider a source code listing in an XML document. It might contain characters that the XML parser would ordinarily recognize as markup (< and &, for example). comments are not recognized in a CDATA section. <![CDATA[ *p = &q; b = (i <= 3); ]]>
63
XML Namespaces http://www.w3.org/TR/REC-xml-names (1/99)
A particular label, e.g., number, may denote different notions in different contexts name ::= [prefix:]localpart <book xmlns:isbn=“ <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book>
64
XML Namespaces syntactic: <number> , <isbn:number>
semantic: provide URL for schema <tag xmlns:mystyle = “ … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> defined here
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.