Chapter 27 The World Wide Web and XML
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet An Overview of XML XML Data Definition XML Data Manipulation XML and Databases SQL Facilities
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-3 The Web and the Internet Often thought of as synonymous, the Web and the Internet refer to two different arenas The Web is a gigantic amorphous database The Internet is a giant network URL’s are used to locate resources on the network (Uniform Resource Locator/Identifier) Markup languages are used to interact with the database
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-4 Hypertext Hypertext Markup Language is a simple language for creating and displaying documents Hypertext Transfer Protocol(HTTP) is used to transfer these documents over the internet At each server data can be served up from system files, or from databases The databases on web servers can be SQL databases
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-5 XML XML provides extensions that permit the markup language to interact with hypertext as well as many other languages, including SQL, and so is useful when implementing web databases XML normally begins with a header called a declaration, followed by an element, consisting of start tag, character data, and end tag
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-6 XML XML normally begins with a header called a declaration, followed by an element, consisting of start tag, character data, and end tag XML declaration XML element start tag, character data, end tag Hello, World. greeting tag; kind=“succinct” XML attribute Attribute name is “kind”; value=“succinct”
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-7 XML History XML was created in 1996 to overcome limitations in SGML and HTML SGML is large and complicated HTML fails to separate structural, semantic, and formatting meta-data, and is not always “well-formed” XML has not supplanted HTML in web browsers, but is used in other areas, especially data interchange
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-8 XML History SGML is large and complicated. It allow user to define their own tags and give their meaning You Should specify the first parameter ……..
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-9 XML Applications Purchase orders, parts catalogues, and inventory records can be expressed in XML A database could consist of XML documents only, but it would NOT be relational XML can be used to represent relations, which could facilitate interchange between the internet and relational databases
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Applications P1 NUT RED> 12 LONDON P2 left-wing-part-10 th -part-Bolt Green 17 Paris
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Applications An XML information set is a document hierarchy
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Hierarchy The root node is the top, and it has children Each child has one parent Relations are structured; XML documents are said to be semi-structured, because its rules are looser An API to XML’s document object model supports retrieval, insertion, deletion and updates(pp901)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved DTDs Document Type Definitions can be constructed using the DTD definition language DTDs are part of the XML standard A DTD can mirror the structure of a relation and then be used to format the output from queries In turn, the XML document produced can be used to generate a relation at the other end Text objects must be well-formed and valid
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Applications Revised Version P1 NUT 12 Part COLOR is Red by Default P2 left-wing-part-10 th -part-Bolt Green 17
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Applications 1. Revised Version <attribute Partuple CITY(LONDON|Oslo|Paris) #required COLOR( Red|Green|Blue) “Red”> P1 5. NUT Part COLOR is Red by Default P2 left-wing-part-10 th -part-Bolt Green 17
Copyright © 2004 Pearson Addison-Wesley. All rights reserved Well-Formedness A textual object is well-formed if and only if: It conforms to the grammar defined in the XML standard Any textual object it references is well-formed Examples of fatal flaws: Start and end tags don’t match, or are missing More than one root element included
Copyright © 2004 Pearson Addison-Wesley. All rights reserved Validity A textual object is valid if and only if it is well-formed and it conforms to a specified DTD DTDs can support uniqueness and referential constraints via ID and IDREF attribute types These constraints do not function as keys, but can be used to transmit information from one relvar to another
Copyright © 2004 Pearson Addison-Wesley. All rights reserved Limitations of DTDs DTDs do not use XML syntax, and they cannot be processed by XML parsers Since everything in this arena is a character string, data type support is lacking They enforce an ordering of elements that is contra-relational They are still beneficial because they enforce a standard that is widely used
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Schema XML Schema is an XML derivative, and can be interpreted by XML parsers Are written using a collection of names, from a name space ( The name space specification: xmlns:xsd=“ It is considerably more prolix XML can enforce primitive types and some derived types XML types have essentially no operators because “types” are still character strings
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Schema XML Schema is an XML derivative, and can be interpreted by XML parsers It is considerably more prolix XML can enforce primitive types and some derived types XML types have essentially no operators because “types” are still character strings
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Data Manipulation XQuery is based on Xpath, which means that it is a read-only facility for traversing XMLs hierarchical paths Because XQuery can report horizontal and vertical subsets, and combine the results, it is said to support “select, project, and join” XUpdate is in the early planning stages, but presumably will support updates For now, only proprietary solutions to update
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML and Databases Three approaches: Store XML documents as attributes Shred documents into attributes Store XML documents in “XML databases”
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Documents as Attributes Define a new type, XMLDOC As a new type, XMLDOC should have operators defined, that can retrieve like XQuery, and that can check for well- formedness and validity
Copyright © 2004 Pearson Addison-Wesley. All rights reserved XML Documents Shred and Publish An XML document may be shredded into its components, which are then stored as attributes Attributes can be recombined and published as XML Documents This is an effective way for SQL databases to interact with the web Relational databases do not store hierarchies, nor are they intrinsically ordered, so shred and publish may not be “nonloss”
Copyright © 2004 Pearson Addison-Wesley. All rights reserved SQL Facilities XML Collection will offer support for shred and publish, where the publish feature supports publishing the XML data, and its schema XML Column will offer a new built-in type, XML that will come an XMLGEN operator to publish XML documents Database vendors offer built-in functions that can read and write elements within XML attribute values, e.g., XMLFILETOCLOB