1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol for describing data XML is quickly becoming standard for data exchange between applications
1 XML Documents XML marks up data using tags, which are names enclosed in angle brackets All tags appear in pairs:.. Elements: units of data (i.e., anything between a start tag and its corresponding end tag) Root element contains all other document elements Tag pairs cannot appear interleaved: Must be: Nested elements form trees What defines an XML document is not its tag names but that it has tags that are formatted in this way.
Root element contains all other document elements Optional XML declaration includes version information parameter (MUST be very first line of file) Because of the nice.. structure, the data can be viewed as organized in a tree: article titledateauthor summarycontent firstNamelastName
dna Aspergillus awamori U03518 aacctgcggaaggatcattaccgagtgcgggtcctttgggccca acctcccatccgtgtctattgtaccctgttgcttcgg cgggcccgccgcttgtcggccgccgggggggcgcctctg ccccccgggcccgtgcccgccggagaccccaacacgaac actgtctgaaagcgtgcagtctgagttgattgaatgcaat cagttaaaactttcaacaatggatctcttggttccggc An I-sequence might be structured as XML like this.. SEQUENCEDATA TYPE SEQ DATA IDNAME comment
1 Parsing and displaying XML XML is just another data format We need to write yet another parser No more filters, please! ? No! XML is becoming standard Many different systems can read XML – not many systems can read our I-sequence format.. Thus, parsers exist already
1 XML document opened in Internet Explorer Minus sign Each parent element/node can be expanded and collapsed Plus sign Standard browsers can format XML documents nicely!
1 XML document opened in Mozilla Again: Each parent element/node can be expanded and collapsed (here by pressing the minus, not the element)
Attributes Data can also be placed in attributes: name/value pairs Attribute (name-value pair, value in quotes): element contact has the attribute type which has the value “to” Empty elements are elements with no character data between the tags. The tags of an empty element may be written in one like this: letter.xml
1 Parsers and trees We’ve already seen that XML markup can be displayed as a tree Some XML parsers exploit this. They – parse the file – extract the data – return it organized in a tree data structure called a Document Object Model article titledateauthor summarycontent firstNamelastName
1 Document Object Model (DOM) a DOM parser retrieves data from XML document returns tree structure called a DOM tree Each component of an XML document represented as a tree node Single root node (the document node) contains all other nodes
1 Python provides a DOM parser! All nodes have name (of tag) and value (data) Text (including whitespace) represented in nodes with tag name #text article title #text date author summary content #text firstName #text lastName #text Simple XML #text Dec #text XML.. easy. #text In this..XML. #text John #text Doe
fig16_04revised.py Parse XML document and load data into variable document documentElement attribute refers to root node nodeName refers to element’s tag name Various node attributes: firstChild nextSibling nodeValue parentNode NB: Changes since book!
1 Program output The first child of root element is: #text whose next sibling is: title Text inside "title" tag is Simple XML Parent node of title is: article Here is the root element of the document: article The following are its child elements: #text title #text date #text author #text summary #text content #text article title #text date author summary content #text firstName #text lastName #text Simple XML #text Dec #text XML.. easy. #text In this..XML. #text John #text Doe
1 Summary XML is widely used Many applications can read XML Python already has an XML parser which returns a tree
1.. on to the exercises