Download presentation
Presentation is loading. Please wait.
Published byΙπποκράτης Ζέρβας Modified over 6 years ago
1
Jagdish Gangolly State University of New York at Albany
XML I Jagdish Gangolly State University of New York at Albany Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
2
Introduction to XML Structured Data vs. Documents
Document: Structure, Form, and Content Markup languages Basic HTML & its shortcomings Why XML Fundamentals of DTDs Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
3
XML I: Structured Data vs. Documents I
In structured data, the schema (metadata) is separate from the data In documents, the metadata (pertaining to form, structure, as well as content) is contained in the document itself. Sometimes, therefore, such data is called self-describing. Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
4
XML I: Structured Data vs. Documents II
Two views of documents Physical view: a byte-stream Logical view: an abstract data structure (a tree of nodes) Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
5
XML I: Document: Structure, Form, and Content I
Structure: Data about the way the document is structured. For example, a letter might consist of sender’s name and address addressee’s name and address date salutation paragraphs in the letter Closing signature Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
6
XML I: Document: Structure, Form, and Content II
Form: Data about how the document should appear to the reader (Formatting). For example, in a letter, you will need to specify things like Left-justification of sender’s name Right-justification of sender’s address Bold-ing of sender’s name, Italicising of sender’s address Italicising/bold-ing of certain text … Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
7
XML I: Document: Structure, Form, and Content III
Content: The semantics of the document content. For example, if it is a business letter, you may want to tag the letter content to specify the semantics of such content. Fdor example If the letter refers to a purchase order, the meaning of such reference must be indicated by the tag If it refers to an invoice, it must be similarly tagged Such tagging makes it possible to integrate databases consisting of self-describing as well as structured data. Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
8
XML I: Markup languages
TeX/LaTeX SGML HTML XML EBXML, XBRL, … MathML, XGMMl, … Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
9
XML I: Basic HTML & its shortcomings
Fixed tagset – no extensibility Virtually no content tagging Mostly formatting tags Lack of discipline in document generation, very forgiving browsers Almost exclusive preoccupation with how the document looks Difficult/awkward/inefficient to interface with structured databases Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
10
XML I: Why XML? Extensible. One can develop custom tagset
Not necessary to have a DTD, but you can specify one Possible to separate content from structure/form Possible to develop custom tagsets based on an object model of the domain Possible to interface efficiently with backend structured (usually relational) databases Possible to use heterogenous namespaces and schema to build modular e-commerce systems Evolving standards to support e-business Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
11
XML I: Fundamentals of DTDs I
Specified using an EBNF (Extended Backus-Naur Form) syntax. With the adoption of XML-Schema specifications, in future much will be replaced by schemas Constraints: Wellformedness Tree structure (root element, each element must have just one parent) Attribute values must be quoted …. Valid (document has a DTD to which it conforms.) Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
12
XML I: Fundamentals of DTDs I
<?xml encoding="UTF-8"?> <!ELEMENT personnel (person)+> <!ELEMENT person (name, *,url*,link?)> <!ATTLIST person id ID #REQUIRED> <!ATTLIST person note CDATA #IMPLIED> <!ATTLIST person contr (true|false) 'false'> <!ATTLIST person salary CDATA #IMPLIED> <!ELEMENT name ((family,given)|(given,family))> <!ELEMENT family (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT (#PCDATA)> <!ELEMENT url EMPTY> <!ATTLIST url href CDATA ' <!ELEMENT link EMPTY> <!ATTLIST link manager IDREF #IMPLIED> <!ATTLIST link subordinates IDREFS #IMPLIED> <!NOTATION gif PUBLIC '-//APP/Photoshop/4.0' 'photoshop.exe'> Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
13
XML I: Fundamentals of DTDs II
DTD Syntax: XML Declaration & Character Encoding <?xml version=‘1.0’ encoding=‘utf-8’ ?> <? … ?> processing instructions Utf-8 : 8-bit encoding, ideal for mostly ascii data <!-- … --> comments Character entities & Pre-declared entities CDATA: quoted attribute values that are not parsed PCDATA: Element content that is parsed Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
14
XML I: Fundamentals of DTDs III
DTD Syntax (Continued): Element declarations <!ELEMENT name content-model> Repitition-factor characters * ‘zero or more’ , + ‘one or more’, ? ‘zero or one’ Content-model EMPTY (neither text nor child elements) <!ELEMENT br EMPTY> ANY (combination of text and child elements) <!ELEMENT container ANY> Children-only content models <!ELEMENT exchange (greeting, response)> Mixed content models ( <!ELEMENT p (#PCDATA | a | ul | b | i | em)*> Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
15
XML I: Fundamentals of DTDs IV
DTD Syntax (Continued): Attribute declaration <!ATTLIST element-name attribute-definitions> where each attribute definition has attribute-name attribute-type deefault-declarations <!ELEMENT multiAttribute ‘EMPTY’> <!ATTLIST multiAttribute name CDATA #REQUIRED nickname ID #REQUIRED bfriend IDREF #IMPLIED penname NMTOKEN #IMPLIED authors NMTOKENS #REQUIRED answer (YES | NO) “NO” method CDATA #FIXED “TAXI” goto (DISCO | MOVIES) #REQUIRED > Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
16
XML I: Fundamentals of DTDs V
DTD Syntax (Continued): CDATA: text in quotes ID: text, but value must be unique in document IDREF: text equal to value of an ID in the document NMTOKEN: restricted text containing only ‘name characters’, can not contain whitespace NMTOKENS: comma-separated list of NMTOKEN items (YES | NO): Enumerated type #REQUIRED: attribute is required #IMPLIED: attribute is optional #FIXED: the attribute must always have the specified default value Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
17
XML II General Entities & Parameter Entities XML Schema XML Namespaces
XSLT XML Parsing with DOM and SAX Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
18
General Entities & Parameter Entities
General entities are used in the xml documents where as parameter entities are used in the DTDs Entities can be referenced. Parameter entities can be referenced in the DTDs only Parameter entities have the % character after keyword ENTITY in the DTD General entities are referenced in the document or DTD by &, parameter entities are referenced in the DTD by % Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
19
XML Schema I The Purchase Order, po.xml <?xml version="1.0"?>
<purchaseOrder orderDate=" "> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric </comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate> </shipDate> </item> </items> </purchaseOrder> Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
20
XML Schema II (From XML Schema Primer 0)
<xsd:schema xmlns:xsd=" <xsd:annotation> <xsd:documentation> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> REFER TO CLASS HANDOUT FOR REST OF THE SCHEMA Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
21
XML Namespaces I Element (tag) names and attribute names can be specified by the XML application/DTD. Therefore, the same tag/attribute names may be used by different applications. When a document contains portions of other documents, you can have name clashes, ie., use of the same tag with more than one sense. Specifying namespaces in xml solves the problem of name clashes You can name a default namespace in a document READ EXAMPLES IN WROX TEXT Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
22
XML Namespaces I URIs used in specifying namespaces, but they are NOT used in validation. Therefore, such URIs are declared only to provide unique names. You can use xmlns as an attribute of elements to disambiguate names. Child elements can be blocked from default namespaces by setting xmlns attribute to an empty string “ “ XML namespace consists of three subspaces: element names, attribute names without prefixes, and attribute names with prefixes (see example in WROX text) Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
23
XML Parsing with SAX SAX
These notes use programs inb the course text Java and XML by Brett McLaughlin Needs less memory since it lays out document as a sequence of events Associates with each tag an event Efficient, but awkward if processing an element depends on preceding/succeeding elements in the document Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
24
Parsing XML in SAX I (using Xerces Parser)
1. Instantiate the parser class (in xerces, an implementation of XMLReader interface) XMLReader parser = new SAXParser(); Provide for command-line input of uri to be parsed public static void main (String[] args){ if (args.length != 1) { System.out.println(“give just the uri”); System.exit(0); } String uri = args[0]; SAXParserDemo parserDemo = new SAXParserDemo(); parserDemo.performDemo(uri); (PerformDemo a method defined in the SAXParserDemo class) Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
25
Parsing XML in SAX II Parse the document
public void performDemo (String uri) { try{ XMLReader parser = new SAXParser(); parser.parse(uri); } catch (IOException e) { System.out.println(“error reading uri” + e.getMessage()); } catch (SAXException e) { System.out .println(“error parsing uri” } Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
26
Handlers - Interfaces ContentHandler ErrorHandler DTDHandler
EntityResolver Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
27
ContentHandler I ContentHandler ContentHandler
Public void setDocumentLocator(Locator locator) Public void startDocument() Public void endDocument() Public void processingInstruction() Public void startPrefixMapping(String prefix, String uri) Public void endPrefixmapping(String prefix) Public void startElement(String namespaceURI, String localName, String rawName, Attributes atts) Public void endElement(String namespaceURI, String localName, String rawName) Public void characters(char[] ch, int start, int end) Public void ignorableWhitespace(char[] ch, int start, int end) Public void skippedEntity(String name) Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
28
ContentHandler II Imports org.xml.sax.Attributes
org.xml.sax.contentHandler org.xml.sax.Locator org.xml.sax.SAXException org.xml.sax.XMLReader Org.apache.xerces.parsers.SAXParser Inf 703, Fall, (Jagdish S. Gangolly) 11/21/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.