Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Data Processing and Transformation ดร. มารุต บูรณรัช 269618: หัวข้อพิเศษด้านเทคโนโลยีสารสนเทศขั้นสูง.

Similar presentations


Presentation on theme: "XML Data Processing and Transformation ดร. มารุต บูรณรัช 269618: หัวข้อพิเศษด้านเทคโนโลยีสารสนเทศขั้นสูง."— Presentation transcript:

1 XML Data Processing and Transformation ดร. มารุต บูรณรัช marut.bur@nectec.or.th marut.bur@nectec.or.th 269618: หัวข้อพิเศษด้านเทคโนโลยีสารสนเทศขั้นสูง - เทคโนโลยีเว็บเชิงความหมาย (Special Topics in Advanced Information Technology – Semantic Web Technology) ภาควิชาวิทยาการคอมพิวเตอร์และเทคโนโลยีสารสนเทศ คณะวิทยาศาสตร์ มหาวิทยาลัยนเรศวร ภาคการศึกษาที่ 2 ปีการศึกษา 2557

2 Outline XML Data Processing APIs  SAX (Simple API for XML)  DOM (Document Object Model) XSL Tranformation (XSLT) Language 2

3 XML Data Processing APIs: DOM and SAX Slides adapted from Pekka Kilpeläinen, Universiry of Kuopio, Finland - http://www.cs.uku.fi/~kilpelai/RDK02/

4 XML Document Parsers Every XML application contains a parser  XML editors, XML browsers  XML data transformation systems XML parsers are becoming standard tools of application development frameworks  JDK v. 1.4 contains JAXP, with its default parser (Apache Crimson)  JAXP = Java API for XML Processing 4

5 Tasks of an XML Parser Document instance decomposition  Elements, Attributes, Text, Processing instructions, Entities,... Verification  Well-formedness checking syntactical correctness of XML markup  Validation (against a DTD or Schema) Access to contents of the DTD  Not always supported 5

6 XML Processing APIs There are two major types of XML APIs  Event-based API The application implements handlers to deal with the various events Simple API for XML (SAX)  Tree-based API Compiles an XML document into an internal tree structure and allows an application to navigate that tree Document Object Model (DOM) 6

7 SAX – an Event- based API

8 Event-based API Application implements a set of callback methods for handling parse events  parser notifies the application by method calls  method parameters qualify events further element type name names and values of attributes values of content strings, … 8

9 Event-based API (2) Application SGML/XML Parser Command line call ESISStream <A</A>Hi! (A i="1"> Ai CDATA 1 -Hi! )A ESIS = Element Structure Information Set 9

10 An event call-back application Application Main Routine startDocument() startElement() characters() Parse() Callback Routines endElement() </A>Hi! "A",[i="1"] "Hi!" "A" 10

11 SAX Event Call-back API A de-facto industry standard o Not an official standard or W3C Recommendation o Developed by members of the xml-dev mailing list Supported directly by major XML parsers o Most are Java based and free: Sun JAXP, IBM XML4J, Oracle's XML Parser for Java, Apache Xerces; MSXML (in IE 5), James Clark's XP 11

12 SAX 2.0 Interfaces SAX interfaces:  Parser-to-application (or call-back) interfaces to attach special behaviour to parser-generated events  Application-to-parser to use the parser  Auxiliary to manipulate parser-provided information 12

13 SAX Processing Example Kilpeläinen Pekka Möttönen Matti Möttönen Maija Römppänen Maija 13

14 SAX Processing Example (2) Task: Format the document as a list like this: Pekka Kilpeläinen (1234) Matti Möttönen (5678) Maija Möttönen (9012) Maija Römppänen (3456) 14

15 SAX Processing Example (3) Solution: using event-based processing:  at the start of a person, record the idnum (e.g., 1234)  keep track of starts and ends of elements last and first, in order to record content of those elements (e.g., "Kilpeläinen" and "Pekka")  at the end of each person, output the collected data 15

16 SAX Programming Example Application: Begin by importing relevant classes: import org.xml.sax.XMLReader; import org.xml.sax.Attributes; import org.xml.sax.ContentHandler; //Default (no-op) implementation of //interface ContentHandler: import org.xml.sax.helpers.DefaultHandler; // SUN JAXP used to obtain a SAX parser: import javax.xml.parsers.*; 16

17 SAX Programming Example (2) Define a class to implement relevant call-back methods: public class SAXDBApp extends DefaultHandler{ // Flags to remember element context: private boolean InFirst = false, InLast = false; // Storage for element contents and // attribute values: private String FirstName, LastName, IdNum; 17

18 SAX Programming Example (3) Call-back methods: record the start of first and last elements, and the idnum attribute of a person : public void startElement ( String namespaceURI, String localName, String rawName, Attributes atts) { if (localName.equals("first")) InFirst = true; if (localName.equals("last")) InLast = true; if (localName.equals("person")) IdNum = atts.getValue("idnum"); } // startElement 18

19 SAX Programming Example (4) Call-back methods continue: Record the text content of elements first and last in corresponding variables: public void characters ( char ch[], int start, int length) { if (InFirst) FirstName = new String(ch, start, length); if (InLast) LastName = new String(ch, start, length); } // characters 19

20 SAX Programming Example (5) Call-back methods continue: at an exit from person, output the collected data: public void endElement(String namespaceURI, String localName, String qName) { if (localName.equals("person")) System.out.println(FirstName + " " + LastName + " (" + IdNum + ")" ); //Update the context flags: if (localName.equals("first")) InFirst = false; //(Correspondingly for "last" and InLast) 20

21 SAX Programming Example (6) Application main method: public static void main (String args[]) throws Exception { // Instantiate an XMLReader (from JAXP // SAXParserFactory): SAXParserFactory spf = SAXParserFactory.newInstance(); try { SAXParser saxParser = spf.newSAXParser(); XMLReader xmlReader = saxParser.getXMLReader(); 21

22 SAX Processing Example (9) Main method continues: // Instantiate and pass a new // ContentHandler to xmlReader: ContentHandler handler = new SAXDBApp(); xmlReader.setContentHandler(handler); for (int i = 0; i < args.length; i++) { xmlReader.parse(args[i]); } } catch (Exception e) { System.err.println(e.getMessage()); System.exit(1); }; } // main 22

23 DOM – a Tree- based API

24 Object Model Interfaces Application interacts with an object-oriented representation of  the parser  the document parse tree consisting of objects like Document, Element, Attribute, Text, … Abstraction level higher than in event based interfaces; more powerful access  to descendants, following siblings, … Disadvantage: Higher memory consumption 24

25 An Object-Model Based Application Application ParserObject In-Memory Document Representation Parse Access/Modify Build Document i=1A "Hi!" </A>Hi! 25

26 Document Object Model (DOM) To provide uniform access to structured documents in diverse applications (parsers, browsers, editors, databases) Overview of W3C DOM Specification  Level 1, W3C Rec, Oct. 1998  Level 2, W3C Rec, Nov. 2000  Level 3, W3C Rec, Apr 2004 26

27 Document Object Model (DOM) (2) An object-based, language-neutral API for XML and HTML documents  allows programs and scripts to build documents, navigate their structure, add, modify or delete elements and content  Provides a foundation for developing, querying, filtering, transformation, rendering etc. 27

28 DOM structure model Based on O-O concepts:  methods (to access or change object’s state)  interfaces (declaration of a set of methods)  objects (encapsulation of data and methods) Roughly similar to the XSLT/XPath data model  a parse tree 28

29 29 invoice invoicepage name addressee addressdata address form="00"type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIO Pyynpolku 1 <invoice> <invoicepage form="00" <invoicepage form="00" type="estimatedbill"> type="estimatedbill"> Leila Laskuprintti Leila Laskuprintti Pyynpolku 1 Pyynpolku 1 70460 KUOPIO 70460 KUOPIO...... Document Element NamedNodeMap Text DOM structure model

30 Structure of DOM Level 1 I: DOM Core Interfaces  Fundamental interfaces basic interfaces to structured documents  Extended interfaces XML specific: CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction II: DOM HTML Interfaces  more convenient to access HTML documents 30

31 DOM Level 2 Level 1: basic representation and manipulation of document structure and content (No access to the contents of a DTD) DOM Level 2 adds  support for namespaces  accessing elements by ID attribute values  optional features interfaces to document views and style sheets an event model (user actions on elements) methods for traversing the document tree and manipulating regions of document (e.g., selected by the user of an editor) 31

32 DOM(core) The primary types of objects :  Node Base type of most objects  Element Represents the elements in a document  DocFragment Root node of a document fragment  Document Represents the root node of a standalone document 32

33 DOM(core) The following are auxiliary types of objects :  NodeIterator Used for iterating over a set of nodes specified by filter  AttributeList Represents a collection of attribute objects, indexed by attribute name  Attribute Represents an attribute in an element object  DocumentContext A repository for metadata about a document  DOM Provides instance-independent document operations 33

34 34 DOM interfaces: Node invoice invoicepage name addressee addressdata address form="00" type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIOPyynpolku 1NodegetNodeTypegetNodeValuegetOwnerDocumentgetParentNode hasChildNodesgetChildNodes getFirstChildgetLastChildgetPreviousSiblinggetNextSibling hasAttributesgetAttributes appendChild(newChild)insertBefore(newChild,refChild)replaceChild(newChild,oldChild)removeChild(oldChild)Document Element NamedNodeMap Text

35 Object Creation in DOM Each DOM object X lives in the context of a Document: X.getOwnerDocument() Objects implementing interface X are created by factory methods D.create X (…), where D is a Document object. E.g:  createElement("A"), createAttribute("href"), createTextNode("Hello!") Creation and persistent saving of Document s left to be specified by implementations 35

36 36 invoice invoicepage name addressee addressdata address form="00"type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIO Pyynpolku 1 DocumentgetDocumentElementcreateAttribute(name)createElement(tagName)createTextNode(data)getDocType()getElementById(IdVal) Node Document Element NamedNodeMap Text DOM interfaces: Document

37 37 DOM interfaces: Element invoice invoicepage name addressee addressdata address form="00"type="estimatedbill" Leila Laskuprintti streetaddresspostoffice 70460 KUOPIO Pyynpolku 1 ElementgetTagNamegetAttributeNode(name)setAttributeNode(attr)removeAttribute(name)getElementsByTagName(name)hasAttribute(name) Node Document Element NamedNodeMap Text

38 Additional Core Interfaces NodeList for ordered lists of nodes  e.g. from Node.getChildNodes() or Element.getElementsByTagName("name") all descendant elements of type "name" in document order (wild-card "*" matches any element type) Accessing a specific node, or iterating over all nodes of a NodeList : Accessing a specific node, or iterating over all nodes of a NodeList : –E.g. Java code to process all children: for (i=0; i<node.getChildNodes().getLength(); i++) process(node.getChildNodes().item(i)); 38

39 Additional Core Interfaces (2) NamedNodeMap for unordered sets of nodes accessed by their name:  e.g. from Node.getAttributes() 39

40 DOM: Implementations Java-based parsers  e.g. IBM XML4J, Apache Xerces, Apache Crimson MS IE5 browser: COM programming interfaces for C/C++ and MS Visual Basic, ActiveX object programming interfaces for script languages XML::DOM (Perl implementation of DOM Level 1) etc. 40

41 A Java-DOM Example A stand-alone toy application BuildXml  either creates a new db document with two person elements, or adds them to an existing db document  based on the example in Sect. 8.6 of Deitel et al: XML - How to program  Use DOM support in Sun JAXP 41

42 Code of BuildXml (1) Begin by importing necessary packages: import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; // Native (parse and write) methods of the // JAXP 1.1 default parser (Apache Crimson): import org.apache.crimson.tree.XmlDocument; 42

43 Code of BuildXml (2) Class for modifying the document in file fileName : public class BuildXml { private Document document; public BuildXml(String fileName) { File docFile = new File(fileName); Element root = null; // doc root element // Obtain a SAX-based parser: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 43

44 Code of BuildXml (3) try { // to get a new DocumentBuilder: documentBuilder builder = factory.newInstance(); if (!docFile.exists()) { //create new doc document = builder.newDocument(); // add a comment: Comment comment = document.createComment( "A simple personnel list"); document.appendChild(comment); // Create the root element: root = document.createElement("db"); document.appendChild(root); 44

45 Code of BuildXml (4) … or if docFile already exists: } else { // access an existing doc try { // to parse docFile document = builder.parse(docFile); root = document.getDocumentElement(); } catch (SAXException se) { System.err.println("Error: " + se.getMessage() ); System.exit(1); } /* A similar catch for a possible IOException */ 45

46 Code of BuildXml (5) Create and add two child elements to root : Node personNode = createPersonNode(document, "1234", "Pekka", "Kilpeläinen"); root.appendChild(personNode); personNode = createPersonNode(document, "5678", "Irma", "Könönen"); root.appendChild(personNode); 46

47 Code of BuildXml (6) Finally, store the result document: try { // to write the // XML document to file fileName ((XmlDocument) document).write( new FileOutputStream(fileName)); } catch ( IOException ioe ) { ioe.printStackTrace(); } 47

48 Methods to create person elements public Node createPersonNode(Document document, String idNum, String fName, String lName) { Element person = document.createElement("person"); person.setAttribute("idnum", idNum); Element firstName = document. createElement("first"); person.appendChild(firstName); firstName. appendChild( document. createTextNode(fName) ); /* … similarly for a lastName */ return person; } 48

49 The main method for BuildXml public static void main(String args[]){ if (args.length > 0) { String fileName = args[0]; BuildXml buildXml = new BuildXml(fileName); } else { System.err.println( "Give filename as argument"); }; } // main 49

50 Summary of XML APIs XML processors make the structure and contents of XML documents available to applications through APIs Event-based APIs  notify application through parsing events  e.g., the SAX call-back interfaces Object-model (or tree) based APIs  provide a full parse tree e.g, DOM, W3C Recommendation  more convenient, but may require too much resources with very large documents Major parsers support both SAX and DOM 50

51 XSL Transformation (XSLT) Slides taken & adapted from -Grigoris Antoniou & Frank van Harmelen, A Semantic Web Primer - Chapter 2 - Structured Web Documents in XML -Andy Clark, XML Style Sheets, http://people.apache.org/~andyc/xml/present/

52 52 Displaying XML Documents Grigoris Antoniou University of Bremen ga@tzi.de may be displayed in different ways:Grigoris AntoniouUniversity of Bremenga@tzi.de

53 53 Style Sheets Style sheets can be written in various languages  E.g. CSS2 (cascading style sheets level 2)  XSL (extensible stylesheet language) XSL includes  a transformation language (XSLT)  a formatting language  Both are XML applications

54 54 XSL Transformations (XSLT) XSLT specifies rules with which an input XML document is transformed to  another XML document  an HTML document  plain text The output document may use the same DTD or schema, or a completely different vocabulary XSLT can be used independently of the formatting language

55 55 XSLT (2) Move data and metadata from one XML representation to another XSLT is chosen when applications that use different DTDs or schemas need to communicate XSLT can be used for machine processing of content without any regard to displaying the information for people to read. In the following example, we use XSLT only to display XML documents

56 56 XSLT Transformation into HTML An author

57 57 Style Sheet Output An author Grigoris Antoniou University of Bremen ga@tzi.de

58 58 Observations About XSLT XSLT documents are XML documents  XSLT resides on top of XML The XSLT document defines a template  In this case an HTML document, with some placeholders for content to be inserted xsl:value-of retrieves the value of an element and copies it into the output document  It places some content into the template

59 59 A Template An author...

60 60 Auxiliary Templates We have an XML document with details of several authors It is a waste of effort to treat each author element separately In such cases, a special template is defined for author elements, which is used by the main template

61 61 Example of XML Document Grigoris Antoniou University of Bremen ga@tzi.de David Billington Griffith University david@gu.edu.net

62 62 Example of an Auxiliary Template Authors

63 63 Example of an Auxiliary Template (2) Affiliation:<xsl:value-of select="affiliation"/> Email:

64 64 Multiple Authors Output Authors Grigoris Antoniou Affiliation: University of Bremen Email: ga@tzi.de David Billington Affiliation: Griffith University Email: david@gu.edu.net

65 65 Explanation of the Example xsl:apply-templates element causes all children of the context node to be matched against the selected path expression  E.g., if the current template applies to /, then the element xsl:apply-templates applies to the root element  I.e. the authors element (/ is located above the root element)  If the current context node is the authors element, then the element xsl:apply-templates select="author" causes the template for the author elements to be applied to all author children of the authors element

66 66 Explanation of the Example (2) It is good practice to define a template for each element type in the document  Even if no specific processing is applied to certain elements, the xsl:apply-templates element should be used  E.g. authors In this way, we work from the root to the leaves of the tree, and all templates are applied

67 67 Processing XML Attributes Suppose we wish to transform to itself the element: Wrong solution: " lastname=" "/>

68 68 Processing XML Attributes (2) Not well-formed because tags are not allowed within the values of attributes We wish to add attribute values into template <person firstname="{@firstname}" lastname="{@lastname}"/>

69 69 Transforming an XML Document to Another

70 70 Transforming an XML Document to Another (2)

71 71 Transforming an XML Document to Another (3)

72 Creating XSLT document Example of empty XSLT document <xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ version=‘1.0’> 72 Note: This will simply copy the text content of the input document to the output.

73 XSLT Features Templates  Map input patterns to output Conditionals Loops Functions Extensions 73

74 Conditionals If statement ... Switch statement ...  Predicates  foo[@bar="value"] 74

75 Loops For statement  …  75

76 XPath Functions Node-set functions  e.g. position(), last(), local-name(), etc… String functions  e.g. string(), contains(), substring(), etc… Boolean functions  e.g. boolean(), not(), etc… Number functions  e.g. number(), sum(), round(), etc… 76

77 Example Transformation Source Destination 01 02 03 Care and Feeding of Wombats 04 42.00 05 06 01 02 03 04 Item Price 05 06 BK123 - Care and Feeding of Wombats 07 $42.00 08 09 10 11 77

78 Example Transformation (1 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 78

79 Example Transformation (2 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 79

80 Example Transformation (3 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 80

81 Example Transformation (4 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 81

82 Example Transformation (5 of 14) Match element 01<xsl:stylesheet xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’ 02 version=‘1.0’> 03 04 05 06 07 08 Item Price 09 10 11 12 13 14 15 82

83 Example Transformation (6 of 14) Match element 17 18 19 20 - 21 22 23 24 25 83

84 Example Transformation (7 of 14) Match element 17 18 19 20 - 21 22 23 24 25 84

85 Example Transformation (8 of 14) Match element 17 18 19 20 - 21 22 23 24 25 85

86 Example Transformation (9 of 14) Match element 17 18 19 20 - 21 22 23 24 25 86

87 Example Transformation (10 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 87

88 Example Transformation (11 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 88

89 Example Transformation (12 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 89

90 Example Transformation (13 of 14) Match element 27 28 29 30 ¥ 31 $ 32 33 34 35 36 37 90

91 Example Transformation (14 of 14) Output ItemPrice BK123 - Care and Feeding of Wombats$42.00 91

92 Rendering XML in Browsers Latest browsers (e.g. IE 6.0+) have support for XSLT Insert “xml-stylesheet” processing instruction  Output ItemPrice BK123 - Care and Feeding of Wombats$42.00 92

93 Useful Links XPath 1.0 Specification  http://www.w3.org/TR/xpath XSLT 1.0 Specification  http://www.w3.org/TR/xslt 93


Download ppt "XML Data Processing and Transformation ดร. มารุต บูรณรัช 269618: หัวข้อพิเศษด้านเทคโนโลยีสารสนเทศขั้นสูง."

Similar presentations


Ads by Google