Download presentation
Presentation is loading. Please wait.
Published byMichael Garrett Modified over 9 years ago
1
XML – Extensible Markup Language
2
Objectives To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers Validating vs. Non-Validating Parsers To understand different XML Parser Interfaces Tree Based Interface Standard : DOM Event Based Interface Standard : SAX Evaluating Parsers Which parser to use?
3
History of XML The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards Tim Berners-Lee and others created W3C (1994) Berners-Lee, who invented the World Wide Web in 1989. In 1970 IBM Introduced SGML SGML: Standard Generalized Markup Language SGML is a semantic and structural language for text documents. SGML is complicated. XML Working Group is formed under W3C in 1996. In 1998 W3C introduced XML 1.0 Extensible Markup Language (XML) is a subset of SGML
4
What is XML? XML stands for eXtensible Markup Language XML is a universal method representing data Used in applications, web and for data exchange XML is a markup language much like HTML, but used for different purposes XML is not a replacement for HTML
5
What is XML? XML was designed to describe data XML is a cross-platform, software and hardware independent tool for transmitting or exchanging information. XML is an open-standards-based technology Extensible Both Human and machine readable XML Standard XML 1.0 (1998). XML 1.1 (Feb 2004)
6
What Exactly is XML used for? Storing data in a structured manner. ( Tree structure) Storing configuration information – typically data in an application which is not stored in a database Most server software have configuration files in XML formats
7
Contd… Transmitting data between applications Overcomes Problems in Client Server applications which are cross-platform in nature Ex: A Windows program talking to a mainframe XML is a universal, standardized language used to represent data such that it can be both processed independently and exchanged between programs and applications and between clients and servers Disparate systems can exchange information in a common format
8
XML Syntax The syntax rules of XML are very simple and very strict. XML tags are not predefined. You must define your own tags GCET All XML elements must have a closing tag This is a paragraph
9
Contd… XML tags are case sensitive This is incorrect Incorrect This is correct Correct All XML elements must be properly nested Jill Jack Incorrect Jill Jack Correct Attribute values must always be quoted reynolds Incorrect reynolds Correct
10
XML Syntax All XML documents must have a root element.....
11
XML Comments Comments in XML Comments are similar to HTML John John@jerry.com John John@jerry.com
12
XML Code John John@jerry.com Tom John John@jerry.com Tom
13
Extensibility in XML A typical XML document is made up of tags enclosing the data; tag names describe the data Because the language is extensible, you can create tags that are specific to your need
14
Contd… For example, your document may contain tags to structure information about employees The tags may include,,and Data stored in XML is self-descriptive One can understand the data by just looking at tag names
15
XML – Exchanging Info Between Apps Convert information stored in the database (or any other format) to an XML format Once it is in XML format, other applications/programs can parse (read) the XML document, which is made up of the initial data XML parsers are freely available and are part of many new programming languages
16
Contd… An Application Spreadsheet Package Spreadsheet Package CAD Package CAD Package Statistical Processing Statistical Processing XML Database
17
Content Structure Presentation XML Doc DTD/XSD XSL XSD-XML Schema Definition DTD-Document Type Definition. XSL-Extensible Stylesheet Language.
18
Document Type Declaration (DTD) DTD (Document Type Definition) is used to enforce structure requirements for an XML document Document type declaration contains reference to Document Type Definition (DTD) and tells the parser which DTD to use for validation
19
Contd… <!DOCTYPE customers [ ]> John Conlon John@jerry.com <!DOCTYPE customers [ ]> John Conlon John@jerry.com
20
XML Schema An XML based alternative to DTD Richer and more useful than DTDs Written in XML and Simpler than DTDs Support data type validation (DTD does not support data type validation)
21
Harrison Ford hford@famous.org Julie jr@pw.com Harrison Ford hford@famous.org Julie jr@pw.com
23
Simple XML Elements with Pre-defined Data Types Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined in XSD with the following statement: XSD Syntax
24
Contd… where "element_name" is the name of the XML element, and "type_name" is one of the data type names pre- defined in XSD. XSD pre-defined data types are divided into 7 groups: Numeric data types Date and time data types String data types Binary data types Boolean data type
25
XSD Syntax Simple XML Elements with Extended Data Types Simple XML Element: An XML element that has no child elements and attributes. Simple XML elements can be defined by using the pre-defined XSD data types.
26
They can also be defined by using extended data types, which are defined by "simpleType" statements: XSD facet statements where "element_name" is the name of the XML element, "xsd:type_name" is a pre-defined data type serving as the base data type, and "my_type_name" is the new data type extended from the base data type.
27
Complex XML Elements Complex XML Element: An XML element that has at least one child element or at least one attribute. Complex XML elements must be defined with complex data types, which are defined by "complexType" statements: XSD Syntax
28
...... where "attribute" statement is used to define an attribute, and "sequence" statement is used to define the group of child elements, and the order the child elements should appear in the XML structure. Note that "attribute" statements must appear after the child element definition statements.
29
XSD Syntax Empty XML Elements Empty XML Element: A special complex XML element that has one attribute or more and no child text nodes. Empty XML elements must be defined with complex data types in the following format:...
30
XSD Syntax Anomymous Data Types If data type is specific to a child element in a parent data type, and there is not need to share it with data types outside the parent data type, you can define it as anonymous data type - a non-named data type defined inline. For example, the following code:
31
defines "my_data_type" which has a "setting" element, which has an anonymous data type defined inline.
32
Well-formed XML Documents A document is made of elements; There is exactly one element, called the root, or document element For all other elements, the elements, delimited by start- and end-tags, nest properly within each other Attributes if any, should have their values enclosed within quotes
33
Valid XML Documents An XML document is valid if it has an associated DTD or Schema and if the document complies with the constraints expressed in it If an XML document is valid, it is also well- formed
34
Document Type Definitions (DTDs) Describes syntax that explains which elements may appear in the XML document what are the element contents and attributes Need for DTD Validating parser ( a program) can be used to check whether XML data adheres to the rules in DTD The parser can do appropriate error handling if there are any violation Validity error is not necessary a fatal error, but some applications may treat it as fatal error
35
Document Type Declarations A valid XML document must include the reference to DTD which validates it Types of DTD Internal DTD: DTD can be embedded into XML document External DTD: DTD can be in a separate file
36
Internal DTD DTD embedded in the XML document The declarations appear between [ and ] E.g. AddressBook.xml
37
<!DOCTYPE AddressBook [ ]> Ram M G Road Bangalore
38
External DTD DTD is present in separate file Example The DTD for AddressBook.xml is contained in a file AddressBook.dtd AddressBook.xml contains only XML Data with a reference to the DTD file AddressBook.xml
39
Ram M G Road Bangalore
40
Anatomy of DTD – Defining new XML tags (Elements) element_name: Specifies name of the XML tag Content_specification: Specifies what are the contents of the element #PCDATA: Parsed character data (Extra white spaces are ignored) #CDATA: Character data (White spaces retained as is) Nested elements Empty Any (generally avoided but used in mixed content model)
41
Example: element Street contains the parsed character Data element Address contains three nested tags Name, Street and City respectively Element AddressBook contains one or more occurrences of element Address
42
Anatomy of DTD – Dealing with multiple children To declare the children of an element we use syntax similar to regular expression in Perl. To define the children of an element we use the following syntax: (Assume a and b are child elements of the element being declared)
43
A+ -One or more occurrences of a A* - Zero or more occurrences of a A?-a or nothing A, B – A followed by B A|B – a or b, but not both (expression) – Surrounding an expression with parentheses means that it is treated as a unit and may have the suffix operator ?,*or +
44
Some examples
45
Anatomy of DTD – Attribute Declarations Specifies allowable attributes of each element Tag-name: Element name Attr-Name: Name of the attribute, the attribute is defined for element Tag-Name
46
Restriction: Value : Shows a simple text value enclosed in quotes #IMPLIED:Indicates that there is no default value for this attribute, and this attribute need not be used #REQUIRED:Indicates that there is no default value for this attribute, but that a value must be assigned to this attribute #FIXED Value: In this case, Value is the attribute’s value, and the attribute must always have this value
47
Anatomy of DTD – Attribute Declarations Example The element Name has attribute salutation which is of type CDATA The attribute salutation must be specified in the Name tag
48
Anatomy of DTD – Entity Declarations (1 of 2) Way to escape special characters Some special characters such as, & are not used as #PCDATA This escaping of the characters is called as “Entity reference”
49
Following different entity references are used in the XML document Built-in Entities: &, <, >, ', " Characters Entities : ó representing ó Example Jammu & Kashmir
50
Anatomy of DTD – Entity Declarations(2 of 2) Data that is frequently used can be declared as an General Entity entity_name : Name of the new Entity entity_contents : Contents of the new entity
51
Example Defines the entity called as MyCountry “India” is the contents of entity MyCountry Usage in the XML Document &MyCountry;
52
XML Schema What is XML Schema? An XML vocabulary for expressing your data's structure and business rules Validating parsers can use Schema to check whether XML data adheres to rules in schema More robust and extensive than DTD, can do even data type validations
53
E.g. : Consider following XML Document 45609 Kiran IWT 80 A
54
Is this data valid? To be valid, it must meet following business rules (constraints) The Result must be comprised of a Subject, Marks, Grade in the order shown The Subject must be any valid subject from the list (DC, IWT, Cryptography) The Marks must be between 0 to 100 only and Grade can be either A or B or C
55
How can XML schema help to accomplish this? Answer It creates XML vocabulary : Defines following set of elements,,, It specifies the contents of each element and restrictions on each element element must contain,, in that order must be one of the valid subjects (IWT, Cryptography, DC) The Marks must be between 0 to 100 only Grade can be either A or B or C
56
XML Schema specifies in which namespace the created vocabulary must be in It is not an actual URL, but uses URL syntax and should be a unique string Example: http://www.Results.com Namespace defines the following vocabulary
57
Example of referring to Schema <res:Result xmlns:res="http://www.Results.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance" xsi:schemaLocation="http://www.Results.com Result.xsd"> Kiran 45609 IWT 80.70 A PF 78.30 B+
58
Schema example : Result.xsd <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.Results.com" xmlns="http://www.Results.com" elementFormDefault="qualified">
59
Schema example : Result.xsd
60
DTD vs Schema XML document and DTD use different syntax : Inconsistency Schema uses XML syntax Limited data type capability DTDs support a very limited capability for specifying data types. DTDs do not support field level validations and complex types E.g. : You can't, express "I want the element to hold an integer with a range of 0 to 100“ in DTD Schema describes a set of data types compatible with those found in databases E.g.: Database supports integer, string, etc data types Schema supports integer, string etc while the DTD does not
61
Element Declarations: Simple Element Syntax : Element_name: Any valid xml name Element_type : Built in Simple type Occurrence : Number of occurrences of that element, optional
62
Example : Defines the element Name of type string Defines the element Marks of simple type float Marks may appear for maximum 5 times And by default for minimum 1 time
63
Element Declarations Syntax :
64
Example Defines non reusable complex element called ‘Subject’ Each element appears in that sequence because tag is used
65
Element Declarations: Reusable Simple Type Element_type_name : Name of the data type Base_data_type : Any of the built in simple data type (integer, float etc) Restriction_specification : Specifies restriction on the element if any
66
Example : Defines the reusable element type MarksType Element defined as MarksType may take minimum value of 0.0 and maximum value 100.0
67
Element Declarations: Reusable Complex Type Syntax Defines the reusable type Type_name Example
68
Defines reusable complex element type SubjectType Comprises of following elements in the sequence specified ( tag) Name Marks Grade This type can be used to define elements in your XML
69
Defining the Attributes Syntax : Example All attributes are declared as simple types. Only complex elements can have attributes
70
Anatomy of XML Schema : Constraints specification Controls occurrence of individual element or group of elements Types of constraints : allows only one element to appear : elements must appear in the same order as they are declared : elements can occur in any order and in any combination
71
constraint E.g.: Allows either first or last name to be used in the instance XML Document
72
constraints E.g.: All elements must appear in the defined order only
73
Anatomy of XML Schema : Constraints specification constraints E.g. : Any of the elements can either appear or not appear Elements may appear in any order
74
XML Parsers
75
XML Parser : The Big Picture Usage of the XML Parser XML Document XML Parser Client Application API’s Parsed Data XML DTD / Schema
76
Why to use Parser? Typically use a pre-built XML parser (e.g. JAXP, Apache Xerces etc) This enables you to build your application much more quickly
77
Need for Parser Defining the Parser’s Responsibilities Ensure that the document adheres to specific standards Does the document match the DTD or Schema? Is the document well-formed? Make the document contents available to your application The parser will parse the XML document, and make this data available to your application An application using parser can access data in XML by going through the hierarchy or using tag names
78
Types of XML Parsers Validating Parser a parser that verifies that the XML document adheres to the DTD or Schema Non-Validating Parser a parser that does not verify the XML document against the DTD or Schema Most parsers provide an option to turn validation on or off All parsers checks the well-formedness of XML document at all times
79
XML Parser Interfaces Two types of Interfaces provided by XML Parsers SAX An Event Based Interface DOM a Tree Based Interface JAXP “Java API for XML Processing” JAXP is part of JDK Provides parsers which can be used in any Java application It supports both Tree Based Parser : DOM Event Based Parser : SAX
80
DOM Parser Tree Based Parser Definition: Parser reads the XML document, and creates an in-memory “tree” representation of XML Document For example: Given a sample XML document below What kind of tree would be produced?
81
Kiran 45609 CHSSC 80 A
82
In memory tree created by Tree Based Parser Tree represents the hierarchy of XML document
83
DOM Parser Result Name EmpNo Kiran 45609 Text Nodes Element Nodes
84
DOM Parser Tree based APIs presents a memory model of entire document to an application once parsing has concluded No need to use extra data-structures to maintain the information during parsing An application can navigate through the tree to find the desired pieces of document Document Object Model (DOM) is the standard for Tree Based parsing of XML document
85
Document Object Model (DOM) The Document Object Model (DOM) is a set of interfaces defined by the W3C DOM Working Group DOM is the tree based interface used by the programmers to manipulate the XML document DOM Parser can be Validating or Non Validating DOM Parser represents the logical Model of the XML document in the memory All the entity reference are expanded before the DOM tree was constructed
86
DOM Structure representing XML Document Element Attribute Element Text Comment Result Name Subject Kiran EmpNo IWT Text 45609 XML Document Structure Document Structure representing Result.xml Name Grade Marks 80.0 A Document Root Element Node Text Node
87
Document Object Model (DOM) : Overview The root of the DOM Hierarchy is called as a Document node Example : Result The Child nodes of the Document node are : Element nodes, Comments nodes etc Example : Name, Subject, EmpNo, etc are all Child Nodes All the nodes in the XML Document are derived from interface : org.w3c.dom.Node
88
The Big picture : Parsing the XML Document Document builder factory creates an instance of parser with required characteristics Whether the parser should be validating parser or not Whether namespace support required or not, Whether to ignore the white spaces between the elements or not Factory hides the implementation details of the parser and gives a standard DOM interface for parsing XML (Analogous to JDBC driver)
90
DomApp.java : Parsing XML Document using DOM Parser public class DomApp { public static void main(String argv[]) { MyErrorHandler hErr; Document hDocument; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); factory.setNamespaceAware(true);
91
try { hErr = new MyErrorHandler(); DocumentBuilder hBuilder = factory.newDocumentBuilder(); // Set the error handler hBuilder.setErrorHandler(hErr); hDocument = hBuilder.parse( new File(“Result.xml”)); } catch (Exception e){ // Handle exception if generated during parsing } }// End of Function main }
92
Parsing the XML Document using DOM Parser Step 1: Get the instance of document-builder factory. This will be used to produce the DOM-parser (called DocumentBuilder) DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); Step 2: Set the properties of the DOM parser to be produced a. It should validate the XML Document against the Schema / DTD b. It should be namespace aware factory.setValidating(true); factory.setNamespaceAware(true); Step 3 : Obtain the instance of the MyErrorHandler class This instance handles the error generated during parsing, in application specific way hErr = new MyErrorHandler();
93
Step 4: Obtain the instance of DOM parser, and register the error handler This will be used to parse the XML Document and creates the memory based tree representation of the XML Document DocumentBuilder hBuilder=factory.newDocumentBuilder(); hBuilder.setErrorHandler(hErr); Step 5 : Parse the XML Document (Result.xml) using the parser created as above hDocument = hBuilder.parse( new File(“Result.xml”));
94
The Node interface is the root of DOM Core class hierarchy This interface can be used to extract information from any DOM object without knowing its actual type (e.g. Element node, Text node, Attr Node etc ) of underlying node i.e. It is possible to access a document's complete structure and content using only the methods and properties exposed by the Node interface The Class Hierarchy rooted at org.w3c.dom.Node
95
DOM : Exploring the org.w3c.dom.Node Interface Node ElementDocument Attr Text Comment Entity
96
DOM : Important Methods of Node interface Methods to retrieve the various information from the XML DOM Tree Node getFirstChild(): Returns the first child of the current node Node getLastChild(): Returns the last child of the current node String getNodeName(): The name of this node String getNodeValue(): The value of this node, depending on its type short getNodeType(): A code representing the type of the underlying object
97
Methods to alter the elements of XML DOM Tree Node insetBefore( Node newChild, Node refChild) Node appendChild (Node newChild) Node removeChild (Node oldChild) Node replaceChild (Node newChild, Node oldChild )
98
Using Node Interface Reslt Name Subject Kiran EmpNo Name 45609 Node hLastChild = hNode.getLastChild(); hFirstChild= hFirstChild.getFirstChild(); String sName = hFirstChild.getNodeName() String sVal = hFirstChild.getNodeValue() hNode = hDocument.getDocumentElement() Node hFirstChild= hNode.getFirstChild();
99
XML Parser Interfaces : Event Based Interface Event Based Interface Definition : Parser reads the XML document and generates events for each parsing step Some common parsing events Element start-tag read Element content read Element end- tag read
100
Example Kiran 45609 CHSSC 80 A
101
XML Parser Interfaces : Event Generated startElement : Result startElement : Name contents: Kiran endElement : Name startElement : EmpNo contents: 45609 endElement : EmpNo endElement : Result
102
XML Parser Interfaces : Event Based Interface For each of these events, your application implements “event handlers” Each time an event occurs, a different event handler is called Your application intercepts these events, and handles them in any way you want Application does not wait till the entire document gets parsed Application has to maintain the information from XML document within local data-structures till it is processed completely Simple API for XML (SAX) is the standard for Event Based parsing of XML document
103
SAXApp.java : Parsing XML Document using SAX Parser public class SAXApp { public static void main(String argv[]) { //Get the instance of parser event handing class DefaultHandler handler = new Handler(); //Get the instance of SAXParserFactory SAXParserFactory factory = SAXParserFactory.newInstance(); try { // Set the properties of the parser to be obtained factory.setValidating(true); factory.setNamespaceAware(true);
104
// Get the new SAX Parser SAXParser saxParser = factory.newSAXParser(); // Parse the file // handler : processes events generated during parsing saxParser.parse(new File(“Result.xml”), handler); } //Handle any exceptions if generated during parsing catch (Throwable t) { t.printStackTrace(); } } // End of function main }
105
SAXApp.java : Parsing XML Document using SAX Parser class Handler extends DefaultHandler{ public void error(SAXParseException e) throws SAXException { System.out.println("Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber()); // Print the error message System.out.print(e.getMessage()); } // Process any fatal errors in the XML document public void fatalError(SAXParseException e) throws SAXException { System.out.println("Fatal Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber()); // Print the error message System.out.print(e.getMessage()); } } //End Class DefaultHander
106
Understanding The Simple API for XML (SAX) Step 1: Get the instance of SAXParserFactory This instance is used to obtain the SAX Parser SAXParserFactory factory = SAXParserFactory.newInstance(); Step 2:Get the instance of the event handler class This class handles all the events generated by parser DefaultHandler handler = new Handler(); Step 3:Set the properties of the parser to be obtained a. It should validate the XML Document against the Schema / DTD b. It should be namespace aware factory.setValidating(true); factory.setNamespaceAware(true); Step 4 : Obtain the instance of the SAX Parser using the factory just obtained SAXParser saxParser = factory.newSAXParser(); Step 5: Parse the Result.xml file using the SAX Parser obtained as above Events generated during parsing will be handled by object handler saxParser.parse(new File(“Result.xml”), handler);
107
The Big picture : Paring the XML Document using SAX XML Document SAX Parser Factory DefaultHandler/ MyHandler org.xml.sax ContentHander org.xml.sax ErrorHander org.xml.sax EntityResolver Parser Events org.xml.sax class hierarchy implements
108
org.xml.sax Interfaces org.xml.sax.DefaultHandler Class Provides the default implementation of all the events DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods). Only the methods which are required are overridden
109
org.xml.sax.ContentHandler Interface Receive notification of the logical content of a document Defines methods like startDocument(), endDocument(), startElement(), and endElement() These are invoked when an XML tags arerecognized Also defines methods characters() which are invoked when the parser encounters the text in an XML element
110
org.xml.sax Interfaces org.xml.sax.ErrorHandler Interface Allows SAX application to do customized error handling The parser will then report all errors and warnings through this interface
111
Important Methods void error() : receives the notification of recoverable error void fatalError(): receives the notification of non- recoverable error void warning(): receives the notification of a warning
112
Evaluating Parsers : SAX vs. DOM SAX Advantage It is good when serial processing of the document is required and document is very large i.e. when the size of the XML document is in terms of GBs. Disadvantage Requires internal data structure to maintain the parts of XML document till the complete processing is not finished, therefore not suitable for parsing the small XML Documents.
113
DOM Advantage Supports DOM Tree Traversing methods Allows modification of XML Document Good when the random access of a document is required Disadvantage For large XML documents (size in GBs) requires more memory as compared to memory required to parse XML document using SAX Parser.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.