eXtensible Markup Language XML Heidi Lischer University of Bern
Overview XML DTD XML Schema CSS XSL/XSLT R and XML 13. November 2019 XML- eXtensible Markup Language Overview XML DTD XML Schema CSS XSL/XSLT R and XML 13. November 2019
What is XML? XML is for structuring data: XML looks a bit like HTML: XML- eXtensible Markup Language What is XML? XML is for structuring data: XML is a set of rules for designing text formats that let you structure your data XML looks a bit like HTML: Like HTML, XML makes use of tags ('<' and '>') and attributes (name="value") . XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads XML is text, but isn't meant to be read: Like HTML, XML files are text files that people shouldn't have to read http://www.w3.org/XML/1999/XML-in-10-points.html.en 13. November 2019
What is XML? XML is modular: XML- eXtensible Markup Language What is XML? XML is a family of technologies: "the XML family" is a growing set of modules that offer useful services to accomplish important and frequently demanded tasks (XLink, XPointer, CSS, XSL, XSLT, DOM, XML Schemas,…) XML is new, but not that new: Development of XML started in 1996 and it has been a W3C Recommendation since February 1998. Before XML there was SGML, developed in the early '80s. XML is modular: XML allows you to define a new document format by combining and reusing other formats http://www.w3.org/XML/1999/XML-in-10-points.html.en 13. November 2019
Main Difference Between XML and HTML XML- eXtensible Markup Language Main Difference Between XML and HTML XML was designed to carry data XML is not a replacement for HTML HTML is about displaying information, while XML is about describing information XML is a cross-platform, software and hardware independent tool for transmitting information Disadvantages of XML: Is larger effort for coding/decoding 13. November 2019
What is XML for? Information identification XML- eXtensible Markup Language What is XML for? Information identification because you can define your own markup, you can define meaningful names for all your information items. Information storage it is backed by an international standard, it will remain accessible and processable as a data format Information structure used to store and identify any kind of (hierarchical) information structure Messaging and data transfer provides a common envelope for inter-process communication (messaging) Web services thousands of data-exchange services use XML for data management and transmission, and the web browser for display and interaction 13. November 2019
XML document XML-document Prolog Document-element XML- eXtensible Markup Language XML document XML-document comments, XML declaration Prolog Document-type-declaration Document-Type-Definition (DTD) Elements Attributes Entities Document-element Document Start-tag, end-tag, empty-element-tag, PCDATA, CDATA, ... 13. November 2019
XML- eXtensible Markup Language XML document Example: Declaration defines the XML version and the character encoding used in the document <?xml version="1.0" encoding="ISO-8859-1" ?> <!--this is a XML file--!> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Comment Root element Child elements of the root End of the root element 13. November 2019
XML document An XML document is an ordered, labeled tree: XML- eXtensible Markup Language XML document An XML document is an ordered, labeled tree: elements nodes: can have child nodes leaf nodes: contain the actual data (text strings) <note> <to> <body> <from> <heading> Don't forget me this weekend! <text1> <text2> Tove Jani Reminder Don't forget me this weekend! 13. November 2019
XML Syntax Rules Very simple and very strict easy to learn and to use XML- eXtensible Markup Language XML Syntax Rules Very simple and very strict easy to learn and to use XML documents must have one root element All other elements must be within this root element elements can have sub elements (child elements) <root> <child> <subchild>.....</subchild> </child> </root> 13. November 2019
XML Syntax Rules All XML elements must have a closing tag XML- eXtensible Markup Language XML Syntax Rules All XML elements must have a closing tag XML tags are case sensitive the tag <Letter> is different from the tag <letter> Opening and closing tags must therefore be written with the same case <to>.....</to> <from>.....</from> <heading>.....</heading> <body>.....</body> <Message>.....</message> incorrect <message>.....</message> 13. November 2019
XML Syntax Rules XML elements must be properly nested XML- eXtensible Markup Language XML Syntax Rules XML elements must be properly nested all elements must be properly nested within each other XML attribute values must be quoted XML elements can have attributes in name/value pairs which must be quoted <root> <child> <subchild>.....</subchild> </child> </root> <note date="12/11/2002"> <to>.....</to> <from>.....</from> </note> 13. November 2019
XML Syntax Rules Name convention for elements and attributes: XML- eXtensible Markup Language XML Syntax Rules Name convention for elements and attributes: Start with a lettre or „_“ e.g.: first, First or _First After first sign, also numbers, „-“ and „.“ are allowed e.g.: _1st-name or _1st.name No spaces and „:“ Do not start with „xml“ Examples: <résumé> <xml-tag> <123> <fun=xml> <first name> 13. November 2019
DTD Document Type Definition XML- eXtensible Markup Language DTD Document Type Definition The purpose of a DTD is to define what elements, attributes and entities are legal in an XML document to verify the data can be declared inside an XML document, or as an external reference Internal DTD: External DTD: <!DOCTYPE root-element [element-declarations]> <!DOCTYPE root-element SYSTEM "filename"> 13. November 2019
DTD Example with an internal DTD: defines the root element XML- eXtensible Markup Language DTD Example with an internal DTD: <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note> defines the root element defines the note element (contains four elements ) defines the to/ from/ heading/ body element (of the type "#PCDATA“) 13. November 2019
XML Schema XML Schema is an XML-based alternative to DTD XML- eXtensible Markup Language XML Schema XML Schema is an XML-based alternative to DTD describes the structure of an XML document XML Schemas will replace DTDs: extensible to future additions richer and more powerful than DTDs written in XML support data types and namespaces 13. November 2019
XML Schema Example: root element (with some namespace declarations) XML- eXtensible Markup Language XML Schema Example: <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> root element (with some namespace declarations) complex type (it contains other elements) simple types 13. November 2019
XML Schema Example: XML document with a reference to an XML Schema: XML- eXtensible Markup Language XML Schema Example: XML document with a reference to an XML Schema: <?xml version="1.0" encoding="ISO-8859-1" ?> <!--this is a XML file--!> <note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Namespace declarations Schema location 13. November 2019
CSS Cascading Style Sheets HTML Style Sheets XML- eXtensible Markup Language CSS Cascading Style Sheets HTML Style Sheets It is possible to use CSS to format an XML document Formatting XML with CSS is NOT the future of how to style XML documents! 13. November 2019
XSL eXtensible Stylesheet Language XML- eXtensible Markup Language XSL eXtensible Stylesheet Language XSL is the preferred style sheet language of XML XSL describes how the XML document should be displayed XSL consists of three parts: XSLT - a language for transforming XML documents XPath - a language for navigating in XML documents XSL-FO - a language for formatting XML documents XSLT is the most important part of XSL 13. November 2019
XML- eXtensible Markup Language XSLT transform XML documents into XHTML documents or to other XML documents to rearrange and sort elements, perform tests and make decisions about which elements to hide and display add/remove elements and attributes to or from the output file 13. November 2019
XSLT Example: XML document with a XSL style sheet reference XML- eXtensible Markup Language XSLT Example: XML document with a XSL style sheet reference <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?> <catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> ... </catalog> 13. November 2019
XSLT Example: XSLT style sheet document XML- eXtensible Markup Language XSLT Example: <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <body> <h2>My CD Collection</h2> <table border="1"> <tr bgcolor="#9acd32"> <th align="left">Title</th> <th align="left">Artist</th> </tr> <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet> XSLT style sheet document match="/" attribute associates the template to the XML elements defines some HTML to write to the output 13. November 2019
R and XML „XML“ package for R: Newest version: 1.93-2 (03.10.2007) XML- eXtensible Markup Language R and XML „XML“ package for R: Newest version: 1.93-2 (03.10.2007) Tools for parsing and generating XML within R parse HTML documents parse DTDs etc. 13. November 2019
R and XML Get with R specific data between tow XML tags XML- eXtensible Markup Language R and XML Get with R specific data between tow XML tags Example: XML file <?xml version="1.0" encoding="iso-8859-1"?> <?xml-stylesheet type="text/xsl" href="stylesheet2.xsl"?> <uebung> <beispiel_1> <titel>Beispiel 1</titel> <bsp1> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 </bsp1> </beispiel_1> <beispiel_2> <titel>Beispiel 2</titel> <bsp2> 9 8 7 6 5 4 3 2 </bsp2> </beispiel_2> </uebung> Data of interest 13. November 2019
R and XML Example: R file XML- eXtensible Markup Language R and XML Example: R file #----open XML package-------------------------------- library(XML) #----read data between an XML tag-------------------- filename = "D:/Heidi/Master/R_Daten/XML/Beispiel.xml" tag = "//bsp1" doc = (filename, useInternal = TRUE) tagData <- (doc, tag, ) print(tagData, indent=FALSE) xmlTreeParse xpathApply xmlValue Parses an XML or HTML file, and generates an R structure representing the XML/HTML tree Extract the contents of a leaf XML node (text node) identifying nodes of interest and apply the given function to it 13. November 2019
R and XML Example: outout in R: a string of the tag content in a list XML- eXtensible Markup Language R and XML Example: outout in R: a string of the tag content in a list Now the data can be converted to a numeric matrix, table,... > print(tagData) [[1]] [1] "\n1 2 3 4\n5 6 7 8\n9 10 11 12\n13 14 15 16\n " > numericMatrix [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 [4,] 13 14 15 16 13. November 2019
Links XML Tutorials: XML in R: Wiki: XML- eXtensible Markup Language Links XML Tutorials: http://www.w3schools.com/xml/ (english) http://www.bitworld.de/grundlagen_xml.html (german) XML in R: http://cran.r-project.org/src/contrib/Descriptions/XML.html (package download) http://cran.r-project.org/doc/packages/XML.pdf (function describtions) Wiki: http://pegnose.homelinux.org/heidi/ 13. November 2019