XML and its applications: 4. Processing XML using PHP
Advantages of XML open source: platform and vendor independent structure: human and machine readable family of technologies: xquery, xslt, xpath, xsd, soap etc. etc. decouple data from application - data lives longer than code (legacy problem) - data first, schema later (pay as you go) spectrum: unstructured to structured data - potentially all data (evolve model) - potentially all information (data, meta-data, code, documents)
Why XML Processing? Needed for: domain-specific applications - publishing - web - communication and networking - security - databases etc. etc. implementing new generic tools Important constituents: parsingXML documents into XML trees navigatingthrough XML trees manipulating XML trees serializing XML trees as XML documents (e.g. DOM Framework)
XML Processing Models Pull & Push: pull: Tree oriented parsers - where the document is read into memory and converted to a tree structure by the parser and processing flow is controlled by the application; (e.g. SimpleXML extension) streaming parsers – reads the document in iterative mode – processing can begin before the parser has finished reading the document (e.g XMLReader extension in PHP 5.2 on) push: where the document is traversed in node order and processing events are fired by the application when some condition is met e.g. the beginning or end of a node; the application has no control to request events instead they are passed by the parser when available; the most popular push based API is the Simple API for XML [SAX2]
Document Object Model (DOM) A specification for a programming interface (API) from the W3C (currently at level 3) "The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page.” (W3C) ext/dom library provides DOM, Schema, RelaxNG, XPath & XPointer functionality of the libxml2 library in php 5
DOM example with PHP 5 $wines = new DOMDocument(); $wines->load( 'wines.xml' ); $categorys = $wines->getElementsByTagName("category"); foreach( $categorys as $category ) { echo "<p>"; $name = $category->getAttribute("name"); echo "<b>Type: $name</b><br/>"; $products = $category->getElementsByTagName("product"); foreach( $products as $product ) $titles = $product->getElementsByTagName( "title" ); $title = $titles->item(0)->nodeValue; echo "$title<br/>"; } echo "</p>"; } ?> run the code
SimpleXML extension in PHP 5 DOM is full featured and allows for the manipulation of XML in a wide variety of ways but is quite complex. It is a large API, requiring a developer to really understand all the intricate details of working with XML. The SimpleXML extension is much simpler and takes an Object Oriented (OO) approach. It views the xml document as an object. Elements are represented as properties and attributes as accessors. It includes full iterator support and allows for the use of the “foreach” operator. It can use the DOM when required (e.g. for writing XML files) and has built in XPath support. SimpleXML works best with uncomplicated, record-like data, such as XML passed as a document or string from another internal part of the same application. Provided that the XML document isn't too complicated, too deep, and lacks mixed content, SimpleXML is easier to code than the DOM.
SimpleXML extension in PHP 5 $wines = simplexml_load_file('wines.xml'); foreach ($wines->category as $category) { echo "<p>"; echo "<b>Type: ".$category["name"]."</b><br/>"; foreach ($category->product as $product) echo $product->title, '<br />'; } echo "</p>"; } ?> run the code
Using XPath to search documents Sometimes there is a need to search through a document (e.g. when the exact format or order of the nodes is not known in advance) or when only a small set of nodes need to be found in a very large document. Using the Xpath support built into the SimpleXML extension this is made easy. The basic format of a XPath query takes the form “a/b/c” where a, b, c are nested xml tags of the form <a><b><c/></b></a>. Some common XPath queries are as follows: a : matches any tag named a a/b : matches any tag named b, directly contained in the tag a a/b/.. : matches and returns the tag a instaed of b a//b : matches b when it is any decendent of a a[31] : matches the 31st a tag a[last()] : matches the very last a tag a[@att] : matches any a with an attribute named “att” a[@att=“val”] matches any tag called a with an attribute named “att” with the value “val”
SimpleXML & XPath example <?php $wines = simplexml_load_file('wines.xml'); $category = $wines->xpath('//category'); foreach ($category as $c) { echo "<p>"; echo "<b>Type: {$c['name']}</b><br/>"; $title = $c->xpath('./product/title'); foreach ($title as $t) echo "{$t}<br/>"; } echo "</p>"; ?> run the code
XSLT processing using PHP // create a new xslt processor $xp = new XsltProcessor(); // create a DOM document and load the XSL stylesheet $xsl = new DomDocument; $xsl->load('patient.xslt'); // import the XSL styelsheet into the XSLT process $xp->importStylesheet($xsl); // create a DOM document and load the XML datat $xml_doc = new DomDocument; $xml_doc->load('patient.xml'); // transform the XML into HTML using the XSL file if ($html = $xp->transformToXML($xml_doc)) { echo $html; } else { trigger_error('XSL transformation failed.', E_USER_ERROR); } ?> run the code
Validating XML using the DOM <?php $dom = new DOMDocument; $dom->load('wines.xml'); if (!$dom->schemaValidate('wines.xsd')) { echo 'Document is not valid'; } else echo 'Document is valid'; ?> run the code
Pull Parsing with XMLReader PHP 5 introduced XMLReader, a new class for parsing XML. Unlike SimpleXML or the Document Object Model (DOM), XMLReader operates in streaming mode. That is, it reads the document from start to finish. Processing can start before the parser has finished reading the document. Unlike the Simple API for XML (SAX), XMLReader is a pull parser rather than a push parser. Hence the processing program is in control. Rather than being told what the parser sees when the parser sees it, the program tells the parser when to go fetch the next piece of the document. The program requests content rather than reacts to it. Another way of thinking about it: XMLReader is an implementation of the Iterator design pattern rather than the Observer design pattern.
Pull Parsing with XMLReader Example <?php $reader = new XMLReader(); $reader->open('wines.xml'); while ($reader->read()) { echo "<p>"; if ($reader->nodeType == XMLREADER::ELEMENT) if ($reader->localName=="category") echo "<b>".$reader->getAttribute("name")."</b>"; } if ($reader->localName=="title") $reader->read(); echo $reader->value; echo "</p>"; } ?> run the code