Copyright © Software Insight Querying and Transforming XML with.NET
Copyright © Software Insight Software Consultant –12+ Years programming, engineering, management experience –C#, VB.NET, C++, Java Contributing Editor: –asp.netPRO –C#PRO –Visual Studio Magazine –.NET Magazine Authored –.NET and COM: Working Together (MightyWords press 2001) –Teach Yourself DirectX 7 in 24 Hours, Sams Press (co-author) Participates in the.NET design reviews Speaker at conferences and user groups –INETA Speakers Bureau Contact at or About Brian Noyes
Copyright © Software Insight Gameplan Quick Intro to XML Technologies Loading XML Query DOM Documents Working with XPathDocuments Transforming XML with XslTransform
Copyright © Software Insight XML Technologies Overview XML is the basis for many emerging technologies and capabilities Hype clouds the issue –XML itself is very simple –derivative technologies and use are where the complexity lies XML forms the glue that allows systems to interoperate
Copyright © Software Insight XML Uses XML is used extensively for –Data definition –B2B communication –Messaging technologies –Structured storage of data
Copyright © Software Insight XML Derivative Technologies Many derivative technologies and grammars continue to be defined and evolve: –XML Schema 1.0 –XPath 1.0 –XSLT 1.0 –XPointer –XLink –SOAP –RSS –WebDAV –…
Copyright © Software Insight What is XML? eXtensible Markup Language Simple syntax for data –Self describing markup –Strict but simple rules define a document –Human readable text –Derived from SGML As is HTML
Copyright © Software Insight XML Syntax Rules –Case sensitive –A document is composed of nodes Everything in a document is a node –A document has one and only one root node That root node is an element –Elements have both start and end tags –Elements are properly nested –Elements can contain attributes, child elements, or text –Special node definitions for specific purposes: Comments, CDATA, Processing Instructions, Entities
Copyright © Software Insight Sample XML Document Going Under Going Under Bring Me To Life Bring Me To Life Everybody's Fool Everybody's Fool
Copyright © Software Insight XML Namespaces Used to scope named objects in a document Similar to programming language namespaces Consist of an arbitrary string Should be unique to ensure no name clashes with other XML documents –Two conventions exist for namespaces to try to ensure uniqueness: URL: Used because domain names are guaranteed to be unique in the Internet space xmlns=“ URN: arbitrary naming with specific syntax xmlns=“urn:idesign-net”
Copyright © Software Insight XML Namespaces Namespaces can be declared at document or element scope –Child nodes are scoped to the default namespace of their parent node Prefixes are used to allow scoping to multiple namespaces with shorthand notation <Root xmlns=" xmlns:xsd=" xmlns:foo="urn:yadda-foobar-skwee"> <!-- Unscoped elements are assumed to be defined in namespace --> <!-- Now unscoped elements are assumed to be defined in urn:idesign-net namespace --> DeDoDoDoDeDaDaDa
Copyright © Software Insight Commonly Used XML Grammars in.NET XML Schemas –Used to define the structure / type system of an XML document XPath –Used for querying XML documents XSLT –Used to transform XML into other document formats
Copyright © Software Insight XML Schemas XML Schemas define: –Allowable structure of an XML document What elements and attributes are allowed where and in what order –The type system of the XML document The elements and attributes represent data, so the organization of those nodes and the kind of data contained in them defines the type system of the document XML Documents are instances of a particular schema
Copyright © Software Insight XML Schemas Three generations to contend with: –Document Type Definitions (DTDs) –XML Data Reduced (XDR) –XML Schema 1.0 Definition Language (XSD)
Copyright © Software Insight Document Type Definitions Structural definition of an XML document Not XML itself Limited type definition capability Quickly becoming obsolete <!DOCTYPE Music [ ]>
Copyright © Software Insight XML Data Reduced (XDR) XML Schema Definition Language (XDR) –Early attempt by Microsoft to standardize an XML-based schema language based on the W3C Schema Recommendation before W3C approval –Not as expressive as XML Schema 1.0 <Schema xmlns="urn:idesign.net-music" xmlns:dt="urn:schemas-microsoft-com:datatypes">
Copyright © Software Insight XML Schema 1.0 (XSD) The formal recommendation approved by W3C Includes: –Primitive type definitions –Simple and complex type definitions –Element and attribute declarations –Namespace scoping –Inheritance, includes, imports The default schema mechanism used by.NET
Copyright © Software Insight XML Schema 1.0 Example
Copyright © Software Insight XPath Overview Query Syntax for XML –Can think of XPath as the SQL for XML –Provides common syntax and semantics for location specifications in a document for both XSLT and XPointer –Navigates the hierarchical structure of a document –Represents a document as a tree of nodes
Copyright © Software Insight XPath Expressions Evaluate to a node set, boolean, number, or string Expressions are evaluated based on a current context or position within a document –Kind of like a cursor in a database Not XML themselves
Copyright © Software Insight XPath Expression Composition XPath Expressions are composed of location steps –Each location step has Axis (optional/implied) Node Test Predicate –Results from one location step are used as the context node for the next location step –Location steps are separated by slashes locstep1/locstep2/locstep3/…
Copyright © Software Insight XPath Expression Composition Axes to define a relative path –descendant::, child::, parent::, etc. –Implicit child:: if not provided –Shorthand operators for common axes: / - root // - descendant. – current node.. – parent node Node Test defines a node to match against –Element, attribute, built-in function Predicate defines conditional logic to determine matches –Can define multiple predicates Implicit AND of separate predicates –Can define complex predicates Boolean logic, built-in functions, extension functions
Copyright © Software Insight XPath and Namespaces Nodes referred to in an XPath expression are frequently scoped to a namespace Up to the processor to decide what to do about it –Usually will only match against properly scoped node names with namespaces Can include namespace prefixes in node names to scope to their namespace –Processor has to be made aware of the namespace URI associated with the prefix somehow Use XmlNamespaceManager in.NET to create collection of prefix/URI associations in the NameTable of the processor object
Copyright © Software Insight XPath Examples Examples: Music/Artist /Music/child::Artist descendant::*[parent::Album] = 2] = 2][contains(text(),'Life')] = 2]
Copyright © Software Insight XSLT Overview Extensible Stylesheet Language Transformations Defines transformations for XSL –Conversions from one document structure to another –Can be used to output document formats suitable for presentation (i.e. HTML, RTF, PDF, etc.) XSLT transforms source trees into result trees –Uses pattern associations specified in XPath –Output determined by templates Fragments of markup that are output based on the matching patterns encountered in the source document match attributes use XPath expressions to evaluate source document node matches –Based on the current context in the document –Selection paths are relative to the current node Really a programming language unto itself
Copyright © Software Insight XSLT Constructs XSLT instructions –Declare templates –Apply templates –Call templates –Declare/pass arguments/parameters –Get the value of a matched node –Copy a node –Evaluate functions –Control program flow for-each, if, choose-when-otherwise –Emit nodes Elements, attributes, etc.
Copyright © Software Insight XSLT Example Artists
Copyright © Software Insight Managing XML in.NET Lots of support for XML in.NET –DOM Support XmlDocument –Data-Oriented Support DataSet / XmlDataDocument –XML Navigation Support XPathNavigator / XPathDocument –Streaming Support XmlReader / XmlWriter –Serialization Support XmlSerialization SOAP formatted object serialization –XML Web Services
Copyright © Software Insight Loading XML Document Object Model (DOM) Oriented –XmlDocument.Load() Load from disk or stream –XmlDocument.LoadXml() Load from a string representation of XML document Data Oriented –DataSet.ReadXml() Loads XML that is suitable for representation as a DataSet Will infer a schema if one is not present –XmlDataDocument constructor Takes a DataSet and provides an XmlDocument derived class
Copyright © Software Insight Loading XML XPath Oriented –XPathDocument() constructor Load from stream, uri, reader XSLT Oriented –XslTransform.Load() Load from URL Stream Oriented –Stream + XmlTextReader Read only/forward only node access from a stream –Document + XmlNodeReader Read only/forward only node access from a document
Copyright © Software Insight XmlResolver Handles all aspects of resolving external entities referred to by an XML Document –DTDs, Entities, schemas, stylesheets Handles security –Logs you into remote locations Handles negotiating network protocols to access entities Optional argument to most XML load operations
Copyright © Software Insight Performing DOM Queries XmlNode class methods –SelectNodes() –SelectSingleNode() XmlDocument class methods –GetElementsByTagName() –GetElementById()
Copyright © Software Insight XmlNode.SelectNodes() Easiest access queries when working with DOM documents Takes an XPath expression Optionally takes an XmlNamespaceManager to support namespace prefixed node names Returns an XmlNodeList –Iterate through the results and operate on them accordingly Provides Read/Write access to nodes in the DOM –Abstract base class – actually an XPathNodeList instance Uses an XPathNavigator under the covers –Derived DocumentXPathNavigator object
Copyright © Software Insight XmlNode.SelectNodes() // Load the document XmlDocument doc = new XmlDocument(); doc.Load("C:\\demos\\MusicBase.xml"); // Call SelectNodes with a valid XPath Query XmlNodeList nodes = doc.SelectNodes( foreach (XmlNode node in nodes) { Console.WriteLine(node.Value); }
Copyright © Software Insight XmlNode.SelectSingleNode() Used to obtain a single match for a query Actually just calls SelectNodes and returns first node in list –Not very efficient, query could return a large result set –Just a convenience method Improving performance: use smarter queries –Construct an XPath query that will only return a single node Smart knowledge of the target data [postion() = 1] predicate
Copyright © Software Insight XmlNode.SelectSingleNode() // Load the document XmlDocument doc = new XmlDocument(); doc.Load("C:\\demos\\MusicBase.xml"); // Select a single node XmlNode dumb = doc.SelectSingleNode( XmlNode smart = doc.SelectSingleNode( Console.WriteLine(dumb.InnerText); Console.WriteLine(smart.InnerText);
Copyright © Software Insight Other DOM Query Options XmlDocument.GetElementsByTagName() –Convenience method for getting collections of elements –Uses internal XmlDocument maintained element tables to obtain –Does not allow you to control level of element selected in heirarchy XmlDocument.GetElementByID() –Takes an ID value and returns the element whose ID attribute matches –Only works with DTD schema definition associated with document –Not very useful
Copyright © Software Insight Beware of XmlNodeList.Count XmlNodeList is an abstract base class XPathNodeList is returned from SelectNodes() –Very lightweight list implementation You may be tempted to check the Count property to see how many nodes you got –Triggers an extremely inefficient walk and test of every node in the node set to calculate the count
Copyright © Software Insight XPathNodeList.Count public virtual int get_Count() { if (!(this.done)) this.ReadUntil( ); return this.list.Count; } internal int ReadUntil(int index) { int local0; XmlNode local1; local0 = this.list.Count; while (!(this.done) && index >= local0) { if (this.iterator.MoveNext()) { local1 = this.GetNode(this.iterator.Current); if (local1 == null) continue; this.list.Add(local1); local0++; continue; } this.done = 1; break; } return local0; }
Copyright © Software Insight XmlDataDocument Bridges the XML and relational views of data –Easy access to XML functionality on the data Queries, transformation, validation –Easy access to relational functionality with the data.NET Databinding Derives from XmlDocument –“Is a” XmlDocument –Looks like a DataSet when needed through a property Everything covered about XmlDocument applies –Its just a derived class –Has a specialized XPathNavigator implementation
Copyright © Software Insight Enter the.NET XPath Processing Engine.NET provides a new model for working with XML for navigation and querying Other models good for what they were designed for –DOM is heavyweight, read/write, random access, handles all XML node types Rich access to complex object model –XmlReader is lightweight, read-only, forward-only, handles all XML Node Types Fast serialized access for getting data in memory Need something that gives the best of both worlds
Copyright © Software Insight XPathNavigator Abstract base class used to define XML navigation and query API for XML in.NET –Document classes that support implement IXPathNavigable Lightweight object model –Hierarchical tree based –Restricted to just elements, attributes, namespaces Name and value Read-Only –Controlled through the API defined by the XPathNavigator methods Random-Access –Can move forward or backward in document, traverse up or down levels easily
Copyright © Software Insight XPathDocument Document class for loading XML into memory for an XPathNavigator to operate against Constructor takes stream, URL, or reader Loads XML into memory in a much lighter weight object model than XmlDocument Used to construct an XPathNavigator through CreateNavigator() call –Actually an XPathDocumentNavigator instance
Copyright © Software Insight XPath Objects
Copyright © Software Insight XPathNavigator Methods MoveXXX Methods –Allows forward/backward navigation amongst the nodes SelectXXX Methods –Select() – primary XPath query evaluation method, returns iterator for node set results –SelectAncenstors()/SelectChildren()/SelectDescendants() Easier selection along particular axes Others –Compile() – Pre-compile XPathExpression for rapid repeated execution –Evaluate() – Similar to Select(), but works for queries that do not return node sets (string, bool, number) –Clone() – Get a copy of the iterator for navigating child nodes –GetAttribute()/GetNamespace() – access node info –Positional checks
Copyright © Software Insight XPathNavigator Properties NodeType Name – fully qualified LocalName – no namespace NamespaceURIValue Prefix – namespace prefix NameTableHasAttributesHasChildrenEtc.
Copyright © Software Insight XPathExpressions Class to support declaring and compiling XPath Queries Not created directly, returned from XPathNavigator.Compile() Only way to associate namespace information with a query
Copyright © Software Insight XPathNodeIterators What gets returned from a call to Select() Lightweight iterator on the resulting node-set Access the current node with Current property Loop through with MoveNext() Count property is pre-calculated
Copyright © Software Insight Querying with XPathNavigator Load the XML into a document class Get an XPathNavigator for the document Optionally compile an XPath expression for reuse Call Select() to get an XPathNodeIterator or Call Evaluate() to get a value back –Returned as an object, must cast to the expected type –Can also return node set iterators this way too
Copyright © Software Insight Simple XPathNavigator Query // Load the document object XPathDocument doc = new XPathDocument("C:\\demos\\MusicBase.xml"); // Get a navigator for the document XPathNavigator nav = doc.CreateNavigator(); // Perform the query XPathNodeIterator iter = // Move through the results with the iterator while (iter.MoveNext()) { Console.WriteLine(iter.Current.Value); }
Copyright © Software Insight Working with Namespaces Create an XmlNamespaceManager using the NameTable of the navigator Add namespace prefix/URI combos Compile the XPath expression Set the XPathExpression context
Copyright © Software Insight Working with Namespace // Load the document XPathDocument doc = new XPathDocument("C:\\demos\\MusicDefNs.xml"); // Get a navigator for the document XPathNavigator nav = doc.CreateNavigator(); // Compile the query with namespace prefixes XPathExpression exp = // Create the namespace manager XmlNamespaceManager mgr = new XmlNamespaceManager(nav.NameTable); // Add an alias for the default namespace mgr.AddNamespace("foo","urn:foo"); // Set the context on the expression exp.SetContext(mgr); // Perform the query XPathNodeIterator iter = nav.Select(exp); // Loop through the results while (iter.MoveNext()) { Console.WriteLine(iter.Current.Value); }
Copyright © Software Insight Advanced Topics Sorting the node set –Use the AddSort methods of the XPathExpression object Specify a sort order or a comparer object Creating custom navigator objects –Can implement IXPathNavigable on any object that you want to expose with an XML representation –Need to implement all the move/select/etc. functionality yourself
Copyright © Software Insight Transforming XML Simple conceptual process –Take one document –Apply rules and matching to parts of the document –Produce a new document form that is equivalent or a subset or different representation of the source document XML -> HTML XML Schema A -> XML Schema B XML -> RTF, PDF, text, etc.
Copyright © Software Insight XslTransform Class XslTransform class provides a powerful XSLT processing engine for.NET –Load from multiple kinds of sources XPath documents, XML readers, URL, File –Transforms to multiple output targets File, URL, Stream, Text/XML Writers –Can take argument lists –Can process embedded scripts C#, Visual Basic, Jscript –Can pass managed objects to the template and allow their methods to be called
Copyright © Software Insight Simple XSLT Transformations Load the XSLT file into an XslTransform instance Call Transform(), passing the source document and the output target // Load the transform document XslTransform xslt = new XslTransform(); xslt.Load("C:\\demos\\MusicTransform.xslt"); // Tranform the source xslt.Transform("C:\\demos\\MusicBase.xml", "C:\\demos\\Music.html",new XmlUrlResolver());
Copyright © Software Insight Transforming with Arguments XSLT templates can take parameters that can be declared inline or can be passed in when the template is called XslTransform supports passing in arguments from the calling code –Must comply with the allowable XSLT types: bool, string, number, XPathNodeIterator Process: –Construct an XslArgumentList –Add parameters to it with the AddParams method –Associate it with the transform in the call to Transform()
Copyright © Software Insight Working with Arguments // Load the transform document XslTransform xslt = new XslTransform(); xslt.Load("C:\\demos\\MusicTransform.xslt"); // Construct the argument list XsltArgumentList xargs = new XsltArgumentList(); xargs.AddParam("ShowTracks",String.Empty,false); // Construct the document to contain the source XPathDocument doc = new XPathDocument("C:\\demos\\Music.xml"); // Construct the output stream FileStream fso = File.OpenWrite("C:\\demos\\Music.html"); // Tranform the source xslt.Transform(doc,xargs,fso,new XmlUrlResolver()); fso.Close();
Copyright © Software Insight XSLT Extensibility –Allows jscript or vbscipt to be declared as part of the template <msxsl:script> –Allows embedded.NET language code –C#, VB.NET, Jscript Extension Objects –Can call methods in managed objects from within the XSLT template –Powerful mechanism for extensibility –Better maintainability and reuse –All of.NET Framework capability available to you
Copyright © Software Insight Extension Objects // Load the transform document XslTransform xslt = new XslTransform(); xslt.Load("C:\\demos\\ExtensionObject.xslt"); // Create the extension object DocGenerator dg = new DocGenerator(); // Construct the argument list XsltArgumentList xargs = new XsltArgumentList(); xargs.AddExtensionObject("urn:idesign-net", dg); // Construct the document to contain the source XPathDocument doc = new XPathDocument("C:\\demos\\Empty.xml"); // Tranform the source xslt.Transform(doc,xargs,fso,new XmlUrlResolver());
Copyright © Software Insight Extension Objects public class DocGenerator { public XPathNodeIterator GetDoc() { XmlDocument doc = new XmlDocument(); doc.LoadXml(" Fred ” + “ Alberta "); XPathNavigator nav = doc.CreateNavigator(); return nav.Select("*"); } <xsl:stylesheet version="1.0" xmlns:xsl=" xmlns:idesign="urn:idesign-net"> We found an employees container node Name:
Copyright © Software Insight Summary Prefer XPathDocument/XPathNavigator when you don’t need to modify nodes Use Extension objects to offload transform computation Resources: –Applied XML Programming for Microsoft.NET, Dino Esposito, Microsoft Press –Manipulate XML Data Easily with the XPath and XSLT APIs in the.NET Framework, Dino Esposito, MSDN Magazine, July ult.aspx ult.aspx –XSL Transformations: XSLT Alleviates XML Schema Incompatibility Headaches, Don Box, Aaron Skonnard, John Lam, MSDN Magazine, August