Python and XML Styling and other issues XML

Slides:



Advertisements
Similar presentations
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
Advertisements

SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
XSLT (eXtensible Stylesheet Language Transformation) 1.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
31 Signs That Technology Has Taken Over Your Life: #6. When you go into a computer store, you eavesdrop on a salesperson talking with customers -- and.
Java API for XML Processing (JAXP) CSE 4/586: Distributed Systems Department of Computer Science and Engineering University at Buffalo, New York Jia Zhao.
Fonts and colors Times New Roman “quotes” Trebuchet "quotes" yellow blue pink green violet.
28-Jun-15 StAX Streaming API for XML. XML parser comparisons DOM is Memory intensive Read-write Typically used for documents smaller than 10 MB SAX is.
29-Jun-15 JAXB Java Architecture for XML Binding.
CS 898N – Advanced World Wide Web Technologies Lecture 22: Applying XML Chin-Chih Chang
COS 381 Day 16. Agenda Assignment 4 posted Due April 1 There was no resubmits of Assignment Capstone Progress report Due March 24 Today we will discuss.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
XML: Java Dr Andy Evans. Java and XML Couple of things we might want to do: Parse/write data as XML. Load and save objects as XML. We’ll mainly discuss.
Technical Track Session XML Techie Tools Tim Bornholt.
PHP and XML TP2653 Advance Web Programming. PHP and XML PHP5 – XML-based extensions, library and functionalities (current XAMPP PHP version is )
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
XML Fundementals XML vs.. HTML XML vs.. HTML XML Document (elements vs. attributes) XML Document (elements vs. attributes) XML and RDBMS XML and RDBMS.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
XML and its applications: 4. Processing XML using PHP.
XML eXtensible Markup Language w3c standard Why? Store and transport data Easy data exchange Create more languages WSDL (Web Service Description Language)
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
DOM Robin Burke ECT 360. Outline XHTML in Schema JavaScript DOM (MSXML) Loading/Parsing Transforming parameter passing DOM operations extracting data.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
Openadaptor XML Support Using openadaptor for XML processing Oleg Dulin,
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Windows Presentation Foundation (WPF) Chapter 16 Dr. Abraham.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Jennifer Widom XML Data Introduction, Well-formed XML.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
XML Study-Session: Part III
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Well Formed XML The basics. A Simple XML Document Smith Alice.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML & JSON. Background XML and JSON are to standard, textual data formats for representing arbitrary data – XML stands for “eXtensible Markup Language”
XML 1.Introduction to XML 2.Document Type Definition (DTD) 3.XML Parser 4.Example: CGI Gateway to XML Middleware.
Week-9 (Lecture-1) XML DTD (Data Type Document): An XML document with correct syntax is called "Well Formed". An XML document validated against a DTD is.
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Unit 4 Representing Web Data: XML
XML Parsers.
Java XML IS
Intro to XML.
Application with Cross-Platform GUI
XML in Web Technologies
The XML Language.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
XML Data Introduction, Well-formed XML.
XML Data DTDs, IDs & IDREFs.
More Sample XML By Sadia Anjum.
Fundamentals of Data Structures
XML Problems and Solutions
XML Dr Andy Evans This lecture we'll talk about XML, a common text format used for complicated datasets, especially those with meaning embedded in the.
XML Dr Andy Evans This lecture we'll talk about XML, a common text format used for complicated datasets, especially those with meaning embedded in the.
XML and its applications: 4. Processing XML using PHP
XML Parsers.
XML Programming in Java
Extensible Markup Language (XML)
XML and Web Services (II/2546)
Presentation transcript:

Python and XML Styling and other issues XML So, how do we process XML in Python.

XML Parsing Two major choices: Document Object Model (DOM) / Tree-based Parsing: The whole document is read in and processed into a tree-structure that you can then navigate around, either as a DOM (API defined by W3C) or bespoke API. The whole document is loaded into memory. Stream based Parsing: The document is read in one element at a time, and you are given the attributes of each element. The document is not stored in memory. We have two broad methods by which we might parse XML programmatically. The first is that we read the whole file in and convert it to a tree, either following the DOM standard or some other tree which is more or less efficient in some way. We can then navigate around that tree. The problem with this is that it involves reading in the whole tree, which for very large files could be an extreme amount of memory. Instead, we can engage it stream-based parsing. This reads and processes one element at a time, rather than the whole document. The disadvantage is that you only understand the element you are processing (or anything you've recorded from those before), not the whole document structure.

Stream-based parsing Trade off control and efficiency. Stream-based Parsing divided into: Push-Parsing / Event-based Parsing (Simple API for XML: SAX) The whole stream is read and as an element appears in a stream, a relevant method is called. The programmer has no control on the in-streaming. Pull-Parsing: The programmer asks for the next element in the XML and can then farm it off for processing. The programmer has complete control over the rate of movement through the XML. Trade off control and efficiency. Stream parsing is broadly divided into SAX and pull-parsing. SAX is a semi-formal community standardised API which allows a program to pull in XML at processing speed (i.e. it does it as fast as it can, with no control from the programmer) and farm elements out to functions depending on their tags. Pull parsing is similar, but the programmer gains control over the speed, asking for one element at a time rather than it simply running through tags as fast as possible.

Standard library xml library contains: xml.etree.ElementTree :parse to tree xml.dom :parse to DOM xml.dom.minidom :lightweight parse to DOM xml.sax :SAX push and pull parser xml.parsers.expat :SAX-like push and pull parser xml.dom.pulldom :pull in partial DOM trees The standard Python library contains six key packages for processing XML matching these models.

Other libraries lxml : simple XML parsing Can be used with SAX (http://lxml.de/sax.html) but here we'll look at simple tree-based parsing. However, there are simpler XML libraries which are very popular. Here we'll mainly look at lxml, a simple xml parser distributed with Anaconda.

Validation using lxml Against DTD: Against XSD: dtd_file = open("map1.dtd") xml1 = open("map1.xml").read() dtd = etree.DTD(dtd_file) root = etree.XML(xml1) print(dtd.validate(root)) Against XSD: xsd_file = open("map2.xsd") xml2 = open("map2.xml").read() xsd = etree.XMLSchema(etree.parse(xsd_file)) root = etree.XML(xml2) print(xsd.validate(root)) To validate XML, we can use lxml, a lightweight XML parser that also does validation. It will validate against both DTD and XSD, the only difference being you need to parse the XSD to read it in. The validate function will take in an XML tree and validate it against the schema, reporting whether the XML matches the schema, and what is wrong if it doesn't. Note that lxml doesn't like the encoding element of prologs: <?xml version="1.0" encoding="UTF-8"?> so these need removing before validation, using the Python string.remove function or similar. Note extra step of parsing the XSD XML

Parsing XML using lxml root = etree.XML(xml1) # Where xml1 is XML text print (root.tag) # "map" print (root[0].tag) # "polygon" print (root[0].get("id")) # "p1" print (root[0][0].tag) # "points" print (root[0][0].text) # "100,100 200,100" etc. Parsing XML read from a file (here xml1 is read as the previous slide) creates a tree of Python objects. The root object is the root of the XML - here we use our example XML from earlier, so the root is the "map" tags. For each element we can ask for the tag using ".tag", specific attributes using "get", or objects within the object by treating it as a list or list of lists. Finally we can get the text content of elements using ".text". <map> <polygon id="p1"> <points>100,100 200,100 200, 200 100,000 100,100</points> </polygon> </map>

Generating XML using lxml root = etree.XML(xml1) # Could start from nothing p2 = etree.Element("polygon") # Create polygon p2.set("id", "p2"); # Set attribute p2.append(etree.Element("points")) # Append points p2[0].text = "100,100 100,200 200,200 200,100" # Set points text root.append(p2) # Append polygon print (root[1].tag) # Check Equally, we can create XML using the tree once it is parsed. Here we see how to add a new polygon, essentially mirroring our reading on the last slide. Here we see two structures for making XML objects: we can make a separate object and append it (p2) or we can make and append it at the same time (points)).

Print XML Pretty print puts linebreaks between objects etc. out = etree.tostring(root, pretty_print=True) print(out) writer = open('xml3.xml', 'wb') # Open for binary write writer.write(out) writer.close() Pretty print puts linebreaks between objects etc. If we have an XML tree (here "root" from the previous slides) and we want to print it or write it out, we can use the "tostring" function. This can be set to "pretty print". "Pretty print" is a broader Python idea; it means to print a set of objects with nice line breaks, tabbing, etc. Note that lxml works with characters read as binary entities to avoid thinking about their character sets; this means you need to write them back out as binary.

Transform XML xsl3 = open("map3.xsl").read() # Read stylesheet xslt_root = etree.XML(xsl3) # Parse stylesheet transform = etree.XSLT(xslt_root) # Make transform result_tree = transform(root) # Transform some XML root transformed_text = str(result_tree) print(transformed_text) writer = open('map3.html', 'w') # Normal writer writer.write(transformed_text) Finally, the above code reads in an XML stylesheet, parses it, and turns it into a Transform object. This has an overridden constructor-like method that will convert an associated XML file into transformed text. The above uses the example that generates HTML from earlier. Note that if the XML is from a file it doesn't need the XSL is referenced in the XML, a major advantage in applying arbitrary stylesheets. Note that if the XML is from a file it doesn't need the XSL is referenced in the XML, a major advantage in applying arbitrary stylesheets.

Other libraries dicttoxml : conversion of dicts to XML untangle : library for converting DOMs to object models Not distributed with Anaconda, but worth looking at. Nice intro by Kenneth Reitz at: http://docs.python-guide.org/en/latest/scenarios/xml/ To round off, here's a couple of popular libraries that aren't distributed with Anaconda, but which are worth checking out.