Download presentation
Presentation is loading. Please wait.
Published byPamela Logan Modified over 9 years ago
1
C# and Windows Programming XML Processing
2
2 Contents Markup XML DTDs XML Parsers DOM
3
3 Markup When we write text, it is just text For example: John Smith 123 Main St. Toronto Ontario We can all read this and understand it A computer cannot and needs additional information
4
4 Markup Markup is added to documents in the form of tags A tag consists of text delimited by angle brackets The name of the tag identifies it and the information which is conveyed by the tag
5
5 Markup Let’s add some semantic markup to our address John Smith 123 Main St. Toronto Ontario This identifies the information in the various parts of the address
6
6 Markup You will notice Tags occur in pairs A start tag A matching end tag with a “/” before the tag name The text that the tags are describing is enclosed between the start tag and the end tag A single tag is placed around the entire document The fact that every start tag has a matching end tag makes the document well-formed
7
7 XML XML is the latest is a long line of markup languages It is the eXtensible Markup Language Unlike, other markup languages, you can define your own tags Any meaning associated with those tags is imposed by your program
8
8 Uses of XML SOAP Simple Object Access Protocol – a type of remote procedure call Configuration files Web services Security information Electronic document exchange
9
9 Defining Documents If you can define your own tags, how do you know what should be in a document? Document Type Definition This defines the allowable tags and their order It is similar to a BNF grammar Schema Like a DTD, it describes the tags and their order It also describes the content which can be placed within the tags
10
10 XML Structure Here is a simple XML document John Smith 123 Main St. Toronto Ontario
11
11 Attributes A tag can also have attributes which provide additional information about the tag Toronto A tag can have zero or more attributes
12
12 The XML Declaration The first line is the optional XML declaration It consists of <?xml Identify this as the XML declaration version=“1.0” The version of XML in the document
13
13 The XML Declaration encoding=“ISO-8859-1” This is the character set used in the document Various character sets can be used including unicode (UTF- 8) an international character set standalone = “no” Determines if the document uses any external entities which are defined in other files This will be discussed later in the course In general, the order of attributes is not important but it is in the XML declaration
14
14 The DOCTYPE Declaration The optional DOCTYPE declaration follows the XML declaration This declaration is required only if you want to validate the document against a definition of the tags in the document
15
15 The Root Element This is the element which begins the document It is the first element in the document It contains all other elements in the document
16
16 Elements An element consists of a start tag, character data, and an end tag John Smith A tag name must start with a letter or underscore A tag name cannot contain spaces or colons The end tag must match the start tag exactly, including case
17
17 Mixed Content If an element contains just text, it has simple content John Smith If it contains a mix of text and elements, it is said to have mixed content these are nested correctly
18
18 Attributes Attributes are name-value pairs which can be added to elements Attributes allow you to provide additional information without changing the tag itself The names for attributes follow the same rules as tag names Every attribute name within the same tag must be unique
19
19 Attributes accountant sales Note that these both contain a name attribute That is OK since the attributes are in separate elements Attribute values are placed in either single or double quotes
20
20 Comments Comments are delimited by spacial brackets Comments can Add explanations Remove XML which is not needed for a while
21
21 Entities The less than and greater than signs delimit tags What if you want to type these symbols in a document and not have them delimit a tag? Then, enter them as entities To enter a less than sign < All entities are referenced using & The entity name ;
22
22 Entities EntitySymbolDescription <<Less than >>Greater than &&Ampersand "“Double quote '‘apostrophe
23
23 CDATA Sometimes using entities is not enough since you have many special characters to type A CDATA section allows you to enter anything without having special characters interpreted
24
24 Document Type Definitions The DTD is one way to describe what should be in a valid XML document There are other ways which we will examine later in the course A DTD Describes each element and the elements which can occur within it Describes the attributes for each element Describes entities which can be used in the document
25
25 Person DTD <!DOCTYPE persontype [ <!ELEMENT person (first, last, gender, employee-id) > ]>
26
26 Reading the DTD There is an element person containing the elements first last gender employee-id These element are described below Each of these contains PCDATA, meaning parseable character data This means that these elements only contain text – not nested tags
27
27 XML Parsers There are two types of XML parsers DOM The Document Object Model This parses the document into a tree-like structure called a DOM The document is parsed all at once SAX Simple Api for Xml This is a sequential parser which executes a callback when each part of the document is recognized This is good for very large documents since the entire document does not have to be in memory at once
28
28 What is DOM? DOM is an in-memory data structure It describes an XML document as a tree structure The nodes in the tree are described by the interface to them This means that there can be many implementations that implement the interface
29
29 So, how do make a document into a tree? Harold Document friend whitespace handle Harold whitespace degree close Root Element Text Attribute
30
30 Nodes All nodes in a DOM implement the Node interface All other interfaces in the tree extend the Node interface This means that every node can be treated as a Node, and maybe more
31
31 XmlNode Represents every node in the DOM Properties ParentNode Name FirstChild NextSibling PreviousSibling Value
32
32 XmlNode Methods InsertBefore() AppendChild() RemoveChild() Clone()
33
33 XmlDocument The node above the root node of the document Can be used to represent an empty document Properties DocumentElement Methods CreateElement() CreateTextNode() GetElementsByTagName() Load() Save()
34
34 XmlElement This represents an element An element can have attributes Properties XmlAttributeCollection Attributes Methods GetElementsByTagName() SetAttribute(string name, string value) string GetAttribute(string name)
35
35 XmlAttribute This is an attribute Can have either Text nodes or EntityReferences as children Name property gets the name Value gets the value
36
36 XmlText This is the node representing text The text has no markup Even whitespace is represented as a text node
37
37 CDATASection Interface This is a CDATA section It is similar to a text node but the content undergoes no interpretation
38
38 Other Node Subinterfaces Comment Notation Entity EntityReference ProcessingInstruction These are all just the same as in XML
39
39 Other Node Subinterfaces DocumentFragment Part of a document tree which can be inserted into another tree DOMImplementation Prevides capabilities of the implementation Has the method for creating a document
40
40 Other Node Subinterfaces DOMException Something went wrong NodeList A list of nodes which has an iterator NamedNodeMap A map structure holding a collection of nodes
41
41 Common.NET DOM Classes XmlNode XmlDocumentXmlElementXmlTextXmlAttribute
42
42 XmlNodeList A list of nodes Returned by GetElementsByTagName() Properties Count -- number of nodes in the list Indexer-- retrieves a node Methods Item(int n)-- retrieves a node
43
43 XmlNamedNodeMap A map of nodes indexed by name Superclass of XmlAttributeCollection Returned by the Attributes property Properties Count Methods Item(int n) GetNamedItem(string name)
44
44 Examples * see NodeLister * see DocBuilder
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.