Download presentation
Presentation is loading. Please wait.
Published byBlaze Watson Modified over 9 years ago
1
XML Data Processing and Transformation ดร. มารุต บูรณรัช marut.bur@nectec.or.th marut.bur@nectec.or.th 269618: หัวข้อพิเศษด้านเทคโนโลยีสารสนเทศขั้นสูง - เทคโนโลยีเว็บเชิงความหมาย (Special Topics in Advanced Information Technology – Semantic Web Technology) ภาควิชาวิทยาการคอมพิวเตอร์และเทคโนโลยีสารสนเทศ คณะวิทยาศาสตร์ มหาวิทยาลัยนเรศวร ภาคการศึกษาที่ 2 ปีการศึกษา 2557
2
Outline XML Data Processing APIs SAX (Simple API for XML) DOM (Document Object Model) XSL Tranformation (XSLT) Language 2
3
XML Data Processing APIs: DOM and SAX Slides adapted from Pekka Kilpeläinen, Universiry of Kuopio, Finland - http://www.cs.uku.fi/~kilpelai/RDK02/
4
XML Document Parsers Every XML application contains a parser XML editors, XML browsers XML data transformation systems XML parsers are becoming standard tools of application development frameworks JDK v. 1.4 contains JAXP, with its default parser (Apache Crimson) JAXP = Java API for XML Processing 4
5
Tasks of an XML Parser Document instance decomposition Elements, Attributes, Text, Processing instructions, Entities,... Verification Well-formedness checking syntactical correctness of XML markup Validation (against a DTD or Schema) Access to contents of the DTD Not always supported 5
6
XML Processing APIs There are two major types of XML APIs Event-based API The application implements handlers to deal with the various events Simple API for XML (SAX) Tree-based API Compiles an XML document into an internal tree structure and allows an application to navigate that tree Document Object Model (DOM) 6
7
SAX – an Event- based API
8
Event-based API Application implements a set of callback methods for handling parse events parser notifies the application by method calls method parameters qualify events further element type name names and values of attributes values of content strings, … 8
9
Event-based API (2) Application SGML/XML Parser Command line call ESISStream <A</A>Hi! (A i="1"> Ai CDATA 1 -Hi! )A ESIS = Element Structure Information Set 9
10
An event call-back application Application Main Routine startDocument() startElement() characters() Parse() Callback Routines endElement() </A>Hi! "A",[i="1"] "Hi!" "A" 10
11
SAX Event Call-back API A de-facto industry standard o Not an official standard or W3C Recommendation o Developed by members of the xml-dev mailing list Supported directly by major XML parsers o Most are Java based and free: Sun JAXP, IBM XML4J, Oracle's XML Parser for Java, Apache Xerces; MSXML (in IE 5), James Clark's XP 11
12
SAX 2.0 Interfaces SAX interfaces: Parser-to-application (or call-back) interfaces to attach special behaviour to parser-generated events Application-to-parser to use the parser Auxiliary to manipulate parser-provided information 12
13
SAX Processing Example Kilpeläinen Pekka Möttönen Matti Möttönen Maija Römppänen Maija 13
14
SAX Processing Example (2) Task: Format the document as a list like this: Pekka Kilpeläinen (1234) Matti Möttönen (5678) Maija Möttönen (9012) Maija Römppänen (3456) 14
15
SAX Processing Example (3) Solution: using event-based processing: at the start of a person, record the idnum (e.g., 1234) keep track of starts and ends of elements last and first, in order to record content of those elements (e.g., "Kilpeläinen" and "Pekka") at the end of each person, output the collected data 15
16
SAX Programming Example Application: Begin by importing relevant classes: import org.xml.sax.XMLReader; import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; //Default (no-op) implementation of //interface ContentHandler: import org.xml.sax.helpers.DefaultHandler; // SUN JAXP used to obtain a SAX parser: import javax.xml.parsers.*; 16
17
SAX Programming Example (2) Define a class to implement relevant call-back methods: public class SAXDBApp extends DefaultHandler{ // Flags to remember element context: private boolean InFirst = false, InLast = false; // Storage for element contents and // attribute values: private String FirstName, LastName, IdNum; 17
18
SAX Programming Example (3) Call-back methods: record the start of first and last elements, and the idnum attribute of a person : public void startElement ( String namespaceURI, String localName, String rawName, Attributes atts) { if (localName.equals("first")) InFirst = true; if (localName.equals("last")) InLast = true; if (localName.equals("person")) IdNum = atts.getValue("idnum"); } // startElement 18
19
SAX Programming Example (4) Call-back methods continue: Record the text content of elements first and last in corresponding variables: public void characters ( char ch[], int start, int length) { if (InFirst) FirstName = new String(ch, start, length); if (InLast) LastName = new String(ch, start, length); } // characters 19
20
SAX Programming Example (5) Call-back methods continue: at an exit from person, output the collected data: public void endElement(String namespaceURI, String localName, String qName) { if (localName.equals("person")) System.out.println(FirstName + " " + LastName + " (" + IdNum + ")" ); //Update the context flags: if (localName.equals("first")) InFirst = false; //(Correspondingly for "last" and InLast) 20
21
SAX Programming Example (6) Application main method: public static void main (String args[]) throws Exception { // Instantiate an XMLReader (from JAXP // SAXParserFactory): SAXParserFactory spf = SAXParserFactory.newInstance(); try { SAXParser saxParser = spf.newSAXParser(); XMLReader xmlReader = saxParser.getXMLReader(); 21
22
SAX Processing Example (9) Main method continues: // Instantiate and pass a new // ContentHandler to xmlReader: ContentHandler handler = new SAXDBApp(); xmlReader.setContentHandler(handler); for (int i = 0; i < args.length; i++) { xmlReader.parse(args[i]); } } catch (Exception e) { System.err.println(e.getMessage()); System.exit(1); }; } // main 22
23
DOM – a Tree- based API
24
Object Model Interfaces Application interacts with an object-oriented representation of the parser the document parse tree consisting of objects like Document, Element, Attribute, Text, … Abstraction level higher than in event based interfaces; more powerful access to descendants, following siblings, … Disadvantage: Higher memory consumption 24
25
An Object-Model Based Application Application ParserObject In-Memory Document Representation Parse Access/Modify Build Document i=1A "Hi!" </A>Hi! 25
26
Document Object Model (DOM) To provide uniform access to structured documents in diverse applications (parsers, browsers, editors, databases) Overview of W3C DOM Specification Level 1, W3C Rec, Oct. 1998 Level 2, W3C Rec, Nov. 2000 Level 3, W3C Rec, Apr 2004 26
27
Document Object Model (DOM) (2) An object-based, language-neutral API for XML and HTML documents allows programs and scripts to build documents, navigate their structure, add, modify or delete elements and content Provides a foundation for developing, querying, filtering, transformation, rendering etc. 27
28
DOM structure model Based on O-O concepts: methods (to access or change object’s state) interfaces (declaration of a set of methods) objects (encapsulation of data and methods) Roughly similar to the XSLT/XPath data model a parse tree 28
29
29 invoice invoicepage name addressee addressdata address form="00"type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIO Pyynpolku 1 <invoice> <invoicepage form="00" <invoicepage form="00" type="estimatedbill"> type="estimatedbill"> Leila Laskuprintti Leila Laskuprintti Pyynpolku 1 Pyynpolku 1 70460 KUOPIO 70460 KUOPIO...... Document Element NamedNodeMap Text DOM structure model
30
Structure of DOM Level 1 I: DOM Core Interfaces Fundamental interfaces basic interfaces to structured documents Extended interfaces XML specific: CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction II: DOM HTML Interfaces more convenient to access HTML documents 30
31
DOM Level 2 Level 1: basic representation and manipulation of document structure and content (No access to the contents of a DTD) DOM Level 2 adds support for namespaces accessing elements by ID attribute values optional features interfaces to document views and style sheets an event model (user actions on elements) methods for traversing the document tree and manipulating regions of document (e.g., selected by the user of an editor) 31
32
DOM(core) The primary types of objects : Node Base type of most objects Element Represents the elements in a document DocFragment Root node of a document fragment Document Represents the root node of a standalone document 32
33
DOM(core) The following are auxiliary types of objects : NodeIterator Used for iterating over a set of nodes specified by filter AttributeList Represents a collection of attribute objects, indexed by attribute name Attribute Represents an attribute in an element object DocumentContext A repository for metadata about a document DOM Provides instance-independent document operations 33
34
34 DOM interfaces: Node invoice invoicepage name addressee addressdata address form="00" type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIOPyynpolku 1NodegetNodeTypegetNodeValuegetOwnerDocumentgetParentNode hasChildNodesgetChildNodes getFirstChildgetLastChildgetPreviousSiblinggetNextSibling hasAttributesgetAttributes appendChild(newChild)insertBefore(newChild,refChild)replaceChild(newChild,oldChild)removeChild(oldChild)Document Element NamedNodeMap Text
35
Object Creation in DOM Each DOM object X lives in the context of a Document: X.getOwnerDocument() Objects implementing interface X are created by factory methods D.create X (…), where D is a Document object. E.g: createElement("A"), createAttribute("href"), createTextNode("Hello!") Creation and persistent saving of Document s left to be specified by implementations 35
36
36 invoice invoicepage name addressee addressdata address form="00"type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIO Pyynpolku 1 DocumentgetDocumentElementcreateAttribute(name)createElement(tagName)createTextNode(data)getDocType()getElementById(IdVal) Node Document Element NamedNodeMap Text DOM interfaces: Document
37
37 DOM interfaces: Element invoice invoicepage name addressee addressdata address form="00"type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIO Pyynpolku 1 ElementgetTagNamegetAttributeNode(name)setAttributeNode(attr)removeAttribute(name)getElementsByTagName(name)hasAttribute(name) Node Document Element NamedNodeMap Text
38
Additional Core Interfaces NodeList for ordered lists of nodes e.g. from Node.getChildNodes() or Element.getElementsByTagName("name") all descendant elements of type "name" in document order (wild-card "*" matches any element type) Accessing a specific node, or iterating over all nodes of a NodeList : Accessing a specific node, or iterating over all nodes of a NodeList : –E.g. Java code to process all children: for (i=0; i<node.getChildNodes().getLength(); i++) process(node.getChildNodes().item(i)); 38
39
Additional Core Interfaces (2) NamedNodeMap for unordered sets of nodes accessed by their name: e.g. from Node.getAttributes() 39
40
DOM: Implementations Java-based parsers e.g. IBM XML4J, Apache Xerces, Apache Crimson MS IE5 browser: COM programming interfaces for C/C++ and MS Visual Basic, ActiveX object programming interfaces for script languages XML::DOM (Perl implementation of DOM Level 1) etc. 40
41
A Java-DOM Example A stand-alone toy application BuildXml either creates a new db document with two person elements, or adds them to an existing db document based on the example in Sect. 8.6 of Deitel et al: XML - How to program Use DOM support in Sun JAXP 41
42
Code of BuildXml (1) Begin by importing necessary packages: import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; // Native (parse and write) methods of the // JAXP 1.1 default parser (Apache Crimson): import org.apache.crimson.tree.XmlDocument; 42
43
Code of BuildXml (2) Class for modifying the document in file fileName : public class BuildXml { private Document document; public BuildXml(String fileName) { File docFile = new File(fileName); Element root = null; // doc root element // Obtain a SAX-based parser: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 43
44
Code of BuildXml (3) try { // to get a new DocumentBuilder: documentBuilder builder = factory.newInstance(); if (!docFile.exists()) { //create new doc document = builder.newDocument(); // add a comment: Comment comment = document.createComment( "A simple personnel list"); document.appendChild(comment); // Create the root element: root = document.createElement("db"); document.appendChild(root); 44
45
Code of BuildXml (4) … or if docFile already exists: } else { // access an existing doc try { // to parse docFile document = builder.parse(docFile); root = document.getDocumentElement(); } catch (SAXException se) { System.err.println("Error: " + se.getMessage() ); System.exit(1); } /* A similar catch for a possible IOException */ 45
46
Code of BuildXml (5) Create and add two child elements to root : Node personNode = createPersonNode(document, "1234", "Pekka", "Kilpeläinen"); root.appendChild(personNode); personNode = createPersonNode(document, "5678", "Irma", "Könönen"); root.appendChild(personNode); 46
47
Code of BuildXml (6) Finally, store the result document: try { // to write the // XML document to file fileName ((XmlDocument) document).write( new FileOutputStream(fileName)); } catch ( IOException ioe ) { ioe.printStackTrace(); } 47
48
Methods to create person elements public Node createPersonNode(Document document, String idNum, String fName, String lName) { Element person = document.createElement("person"); person.setAttribute("idnum", idNum); Element firstName = document. createElement("first"); person.appendChild(firstName); firstName. appendChild( document. createTextNode(fName) ); /* … similarly for a lastName */ return person; } 48
49
The main method for BuildXml public static void main(String args[]){ if (args.length > 0) { String fileName = args[0]; BuildXml buildXml = new BuildXml(fileName); } else { System.err.println( "Give filename as argument"); }; } // main 49
50
Summary of XML APIs XML processors make the structure and contents of XML documents available to applications through APIs Event-based APIs notify application through parsing events e.g., the SAX call-back interfaces Object-model (or tree) based APIs provide a full parse tree e.g, DOM, W3C Recommendation more convenient, but may require too much resources with very large documents Major parsers support both SAX and DOM 50
51
XSL Transformation (XSLT) Slides taken & adapted from -Grigoris Antoniou & Frank van Harmelen, A Semantic Web Primer - Chapter 2 - Structured Web Documents in XML -Andy Clark, XML Style Sheets, http://people.apache.org/~andyc/xml/present/
52
52 Displaying XML Documents Grigoris Antoniou University of Bremen ga@tzi.de may be displayed in different ways:Grigoris AntoniouUniversity of Bremenga@tzi.de
53
53 Style Sheets Style sheets can be written in various languages E.g. CSS2 (cascading style sheets level 2) XSL (extensible stylesheet language) XSL includes a transformation language (XSLT) a formatting language Both are XML applications
54
54 XSL Transformations (XSLT) XSLT specifies rules with which an input XML document is transformed to another XML document an HTML document plain text The output document may use the same DTD or schema, or a completely different vocabulary XSLT can be used independently of the formatting language
55
55 XSLT (2) Move data and metadata from one XML representation to another XSLT is chosen when applications that use different DTDs or schemas need to communicate XSLT can be used for machine processing of content without any regard to displaying the information for people to read. In the following example, we use XSLT only to display XML documents
56
56 XSLT Transformation into HTML An author
57
57 Style Sheet Output An author Grigoris Antoniou University of Bremen ga@tzi.de
58
58 Observations About XSLT XSLT documents are XML documents XSLT resides on top of XML The XSLT document defines a template In this case an HTML document, with some placeholders for content to be inserted xsl:value-of retrieves the value of an element and copies it into the output document It places some content into the template
59
59 A Template An author...
60
60 Auxiliary Templates We have an XML document with details of several authors It is a waste of effort to treat each author element separately In such cases, a special template is defined for author elements, which is used by the main template
61
61 Example of XML Document Grigoris Antoniou University of Bremen ga@tzi.de David Billington Griffith University david@gu.edu.net
62
62 Example of an Auxiliary Template Authors
63
63 Example of an Auxiliary Template (2) Affiliation:<xsl:value-of select="affiliation"/> Email:
64
64 Multiple Authors Output Authors Grigoris Antoniou Affiliation: University of Bremen Email: ga@tzi.de David Billington Affiliation: Griffith University Email: david@gu.edu.net
65
65 Explanation of the Example xsl:apply-templates element causes all children of the context node to be matched against the selected path expression E.g., if the current template applies to /, then the element xsl:apply-templates applies to the root element I.e. the authors element (/ is located above the root element) If the current context node is the authors element, then the element xsl:apply-templates select="author" causes the template for the author elements to be applied to all author children of the authors element
66
66 Explanation of the Example (2) It is good practice to define a template for each element type in the document Even if no specific processing is applied to certain elements, the xsl:apply-templates element should be used E.g. authors In this way, we work from the root to the leaves of the tree, and all templates are applied
67
67 Processing XML Attributes Suppose we wish to transform to itself the element: Wrong solution: " lastname=" "/>
68
68 Processing XML Attributes (2) Not well-formed because tags are not allowed within the values of attributes We wish to add attribute values into template <person firstname="{@firstname}" lastname="{@lastname}"/>
69
69 Transforming an XML Document to Another
70
70 Transforming an XML Document to Another (2)
71
71 Transforming an XML Document to Another (3)
72
Creating XSLT document Example of empty XSLT document <xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ version=‘1.0’> 72 Note: This will simply copy the text content of the input document to the output.
73
XSLT Features Templates Map input patterns to output Conditionals Loops Functions Extensions 73
74
Conditionals If statement ... Switch statement ... Predicates foo[@bar="value"] 74
75
Loops For statement … 75
76
XPath Functions Node-set functions e.g. position(), last(), local-name(), etc… String functions e.g. string(), contains(), substring(), etc… Boolean functions e.g. boolean(), not(), etc… Number functions e.g. number(), sum(), round(), etc… 76
77
Example Transformation Source Destination 01 02 03 Care and Feeding of Wombats 04 42.00 05 06 01 02 03 04 Item Price 05 06 BK123 - Care and Feeding of Wombats 07 $42.00 08 09 10 11 77
78
Example Transformation (1 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 78
79
Example Transformation (2 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 79
80
Example Transformation (3 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 80
81
Example Transformation (4 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 81
82
Example Transformation (5 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 82
83
Example Transformation (6 of 14) Match element 17 18 19 20 - 21 22 23 24 25 83
84
Example Transformation (7 of 14) Match element 17 18 19 20 - 21 22 23 24 25 84
85
Example Transformation (8 of 14) Match element 17 18 19 20 - 21 22 23 24 25 85
86
Example Transformation (9 of 14) Match element 17 18 19 20 - 21 22 23 24 25 86
87
Example Transformation (10 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 87
88
Example Transformation (11 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 88
89
Example Transformation (12 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 89
90
Example Transformation (13 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 90
91
Example Transformation (14 of 14) Output ItemPrice BK123 - Care and Feeding of Wombats$42.00 91
92
Rendering XML in Browsers Latest browsers (e.g. IE 6.0+) have support for XSLT Insert “xml-stylesheet” processing instruction Output ItemPrice BK123 - Care and Feeding of Wombats$42.00 92
93
Useful Links XPath 1.0 Specification http://www.w3.org/TR/xpath XSLT 1.0 Specification http://www.w3.org/TR/xslt 93
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.