Download presentation
Presentation is loading. Please wait.
1
CIT 383: Administrative Scripting
XML CIT 383: Administrative Scripting
2
CIT 383: Administrative Scripting
Topics What is XML? XML Structure REXML CIT 383: Administrative Scripting
3
eXtensible Markup Language
Extensible descriptive markup language framework Began as subset of Standard Generalized Markup Language (SGML). To ensure that data remains available after programs that originally created/read it become obsolete or unusable. <?xml version="1.0" encoding="UTF-8"?> <inventory> <book isbn=“ ”> <author>Chris Pine</author> <title>Learn to Program</title> </book> </inventory> CIT 383: Administrative Scripting
4
Descriptive vs Presentational
Presentational describe how documents should look <b>text</b> turns on boldface for text What if you want to change book titles from bold to italics? Replace won’t work if items other than books are bold. Descriptive languages focus on the meaning <title>xml and you</title> Stylesheets describe how to present logical items. Can just be used for data storage, interchange. A/K/A logical or structural markup languages. CIT 383: Administrative Scripting
5
CIT 383: Administrative Scripting
XML-based Languages Ant Atom CML MathML MML MusicXML ODF OPML RDF SAML SOAP SVG VoiceXML WML XHTML XUL CIT 383: Administrative Scripting
6
CIT 383: Administrative Scripting
Evolution of XML 1986 SGML standard published as ISO 8879 1987 Unicode proposal published 1991 First volume of Unicode standard 1996 XML work started 1998 XML 1.0 released as a W3C standard 2001 XML Schema language 2004 XML 1.1 released (not widely used) 2007 Unicode 5.0 published CIT 383: Administrative Scripting
7
CIT 383: Administrative Scripting
XML Tree Structure <todo> <title> Monday’s List </title> <item> Study for midterm </item> <priority=10/> Scripting Class Bathe cat </html> CIT 383: Administrative Scripting
8
Elements and Attributes
An element consists of tags and contents <title>Learn to Program</title> Begin and end tags are mandatory. <isbn number=“ ” /> Attributes number=“ ” Elements may have zero or more attributes. Attribute values must always be quoted. CIT 383: Administrative Scripting
9
CIT 383: Administrative Scripting
Text XML declaration specifies character encoding <?xml version="1.0" encoding="UTF-8"?> Encodings Unicode: universal character set, UTF-8, UTF-32 ISO-8859: 8-bit encodings, is West Europe Entities &#nnnn; encodes specified Unicode character &name; are named character entities, such as < is < > is > & is & currency symbols, fractions, Greek letters, math symbols, etc. CIT 383: Administrative Scripting
10
CIT 383: Administrative Scripting
XML Syntax Rules There is one and only one root tag. Begin tags must be matched by an end tag. XML tags must be properly nested. XML tags are case sensitive. All attribute values must be quoted. Whitespace within tags is part of text. Newlines are always stored as LF. HTML-style comments: <!-- comment --> CIT 383: Administrative Scripting
11
CIT 383: Administrative Scripting
Correctness Well-formed Conforms to XML syntax rules. A conforming parser will not parse documents that are not well-formed. Valid Conforms to XML semantics rules as defined in Document Type Definition (DTD) XML Schema A validating parser will not parse invalid documents. CIT 383: Administrative Scripting
12
CIT 383: Administrative Scripting
XML Schema Languages <?xml version="1.0" encoding="utf-8" ?> <xs:schema elementFormDefault="qualified" xmlns:xs=" <xs:element name="Address"> <xs:complexType> <xs:sequence> <xs:element name="Recipient" type="xs:string" /> <xs:element name="House" type="xs:string" /> <xs:element name="Street" type="xs:string" /> <xs:element name="Town" type="xs:string" /> <xs:element minOccurs="0" name="County" type="xs:string" /> <xs:element name="PostCode" type="xs:string" /> <xs:element name="Country"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FR" /> <xs:enumeration value="DE" /> <xs:enumeration value="ES" /> <xs:enumeration value="UK" /> <xs:enumeration value="US" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema> Document Type Definitions Inherited from SGML. No support for all XML. XML Schema Most commonly used. Schemas are XML docs. A/K/A WXS, XSD RELAX NG REgular LAnguage for XML Next Generation XML and non-XML forms. Schema example from CIT 383: Administrative Scripting
13
CIT 383: Administrative Scripting
Ruby XML Parsers REXML: Ruby Electric XML Standard with the ruby language. Slow on large documents. libxml-ruby Ruby bindings for Gnome libxml2 XML toolkit. Very fast (30X as fast as REXML). HPricot Parses XML as well as HTML. Fast (3-4X as fast as REXML). Does not check for well-formedness or validity. CIT 383: Administrative Scripting
14
CIT 383: Administrative Scripting
Types of Parsing Tree Parsing (DOM-like) Good for small documents. Loads entire document into memory. Simple API Stream Parsing (SAX-like) Good for large documents. User defines callback methods, passes to API. Parser runs callback methods on pattern match. CIT 383: Administrative Scripting
15
CIT 383: Administrative Scripting
Tree Parsing Loads entire XML doc into memory. require ‘rexml/document’ include REXML input = File.new(‘data.xml’) doc = Document.new(input) root = doc.root Search document as a tree using XPath doc.elements.each(“ch/section”) do |e| puts e.attributes[“title”] end CIT 383: Administrative Scripting
16
CIT 383: Administrative Scripting
Stream Parsing Define listener class. class MyListener include REXML::StreamListener def tag_start(*args) puts “start: #{args.map {|x| x.inspect}.join(‘,’” end Invoke parser require ‘rexml/document’ require ‘rexml/streamlistener’ include REXML listen = MyListener.new source = File.new(‘data.xml’) Document.parse_stream(source, listen) Example from The Ruby Way, 2/e, pp CIT 383: Administrative Scripting
17
CIT 383: Administrative Scripting
XPath Searches h.search("p") Find all paragraph tags in document. doc.search("/html/body//p") Find all paragraph tags within the body tag. Find all anchor tags with a src attribute. Find all a tags with a src attribute of google.com. CIT 383: Administrative Scripting
18
CIT 383: Administrative Scripting
References Michael Fitzgerald, Learning Ruby, O’Reilly, David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. Hal Fulton, The Ruby Way, 2nd edition, Addison- Wesley, 2007. Robert C. Martin, Clean Code, Prentice Hall, Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2nd edition, Pragmatic Programmers, 2005. CIT 383: Administrative Scripting
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.