1 Extensible Markup Language: XML XML developed by World Wide Consortium’s (W3C’s) XML Working Group (1996) XML portable, widely supported technology for.

Slides:



Advertisements
Similar presentations
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Advertisements

XML: Extensible Markup Language
Website Design.
XSLT 11-Apr-17.
XSL eXtensible Stylesheet Language. What is XSL? XSL is a language that allows one to describe a browser how to process an XML file. XSL can convert an.
XSL XSLT and XPath 11-Apr-17.
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol.
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
1 Parsing XML sequence? We have i2xml filter (exercise) – we want xml2i also Don’t have to write XML parser, Python provides one Thus, algorithm: – Open.
IS 373—Web Standards Todd Will
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol.
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a
XML An introduction. xml XML like HTML is created from the Standard Generalized Markup Language, SGML.
1 Extensible Markup Language: XML HTML: widely supported protocol for formatting data XML: widely supported protocol for describing data XML is quickly.
 2001 Prentice Hall, Inc. All rights reserved. Chapter 5 – Creating Markup with XML Outline 5.1Introduction 5.2Introduction to XML Markup 5.3Parsers and.
Sistemi basati su conoscenza XML Prof. M.T. PAZIENZA a.a
Introduction to XML: Yong Choi CSU Bakersfield.
Introduction to XML Extensible Markup Language
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Introduce of XML Xiaoling Song CS157A. What is XML? XML stands for EXtensible Markup Language XML stands for EXtensible Markup Language XML is a markup.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
DAT602 Database Application Development Lecture 14 HTML.
HTML DOM.  The HTML DOM defines a standard way for accessing and manipulating HTML documents.  The DOM presents an HTML document as a tree- structure.
JS: Document Object Model (DOM)
CREATED BY ChanoknanChinnanon PanissaraUsanachote
 2003 Prentice Hall, Inc. All rights reserved. Chapter 20 – Extensible Markup Language (XML) Outline 20.1 Introduction 20.2 Structuring Data 20.3 XML.
 2003 Prentice Hall, Inc. All rights reserved. 3 rd Edition Slide 1 Chapter 20 – Extensible Markup Language (XML) Outline 20.1 Introduction 20.2 Structuring.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
HTML INTRODUCTION, EDITORS, BASIC, ELEMENTS, ATTRIBUTES.
CSCI/CMPE 4341 Topic: Programming in Python Chapter 9: Python XML Processing Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
What is XML (Extensible Markup Language)? XML is basically a better comma delimited file. Example: Your client asks you to write a new reporting system.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
5.2 DOM (Document Object Model). 2 Motto: To write it, it took three months; to conceive it three minutes; to collect the data in it — all my life. —F.
How do I use HTML and XML to present information?.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
What it is and how it works
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
1 Dr Alexiei Dingli XML Technologies SAX and DOM.
Unit 10 Schema Data Processing. Key Concepts XML fundamentals XML document format Document declaration XML elements and attributes Parsing Reserved characters.
Introduction to the Document Object Model Eugenia Fernandez IUPUI.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
 defined as Extensible Markup Language (XML) is a set of rules for encoding documents  Defines structure and data.
CO1552 – Web Application Development Further JavaScript: Part 1: The Document Object Model Part 2: Functions and Events.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
JS: Document Object Model (DOM) DOM stands for Document Object Model, and allows programmers generic access to: DOM stands for Document Object Model, and.
CHAPTER TWO HTML TAGS. 1.Basic HTML Tags 1.1 HTML: Hypertext Markup Language  HTML stands for Hypertext Markup Language.  It is the markup language.
VCE IT Theory Slideshows by Mark Kelly study design By Mark Kelly, vceit.com, Begin.
XP 1Creating Web Pages with XML Tutorial 1 New Perspectives on XML Tutorial 1 – Creating an XML Document.
XML DOM Week 11 Web site:
Unit 4 Representing Web Data: XML
Introduction to the Document Object Model
Introduction to XHTML.
Week 11 Web site: XML DOM Week 11 Web site:
What is XML?.
14 XML.
Presentation transcript:

1 Extensible Markup Language: XML XML developed by World Wide Consortium’s (W3C’s) XML Working Group (1996) XML portable, widely supported technology for describing data XML quickly becoming standard for data exchange between applications

XML Documents XML marks up data using tags, which are names enclosed in angle brackets All tags appear in pairs:.. Elements: units of data (i.e., everything included between a start tag and its corresponding end tag) Root element contains all other document elements Tag pairs cannot appear interleaved: Must be: Nested elements form hierarchies (trees) Thus: What defines an XML document is not its tag names but that it has tags that are formatted in this way.

3 article.xml Simple XML 9 10 December 21, John 14 Doe XML is pretty easy In this chapter, we present a wide variety of examples 20 that use XML End tag has format Root element contains all other document elements Optional XML declaration includes version information parameter XML comments delimited by article titledateauthor summarycontent firstNamelastName Because of the nice.. structure, the data can be viewed as organized in a tree:

4 dna Aspergillus awamori U03518 aacctgcggaaggatcattaccgagtgcgggtcctttgggccca acctcccatccgtgtctattgtaccctgttgcttcgg cgggcccgccgcttgtcggccgccgggggggcgcctctg ccccccgggcccgtgcccgccggagaccccaacacgaac actgtctgaaagcgtgcagtctgagttgattgaatgcaat cagttaaaactttcaacaatggatctcttggttccggc An I-sequence structured as XML SEQUENCEDATA TYPE SEQ DATA IDNAME

5 Parsing and displaying XML XML is just another data format We need to write yet another parser No more filters, please! ? No! XML is becoming standard Many different systems can read XML – not many systems can read our I-sequence format.. Thus, parsers exist already

6 XML document opened in Internet Explorer Minus sign Each parent element/node can be expanded and collapsed Plus sign

7 XML document opened in Mozilla Again: Each parent element/node can be expanded and collapsed (here by pressing the minus, not the element)

8 letter.xml Jane Doe 9 Box Any Ave. 11 Othertown 12 Otherstate John Doe Main St Anytown 23 Anystate Dear Sir: 30 Attribute (name-value pair, value in quotes): element contact has the attribute type which has the value “from” Empty elements do not contain character data. The tags of an empty element may be written in one like this: Attributes Data can also be placed in attributes: name/value pairs

9 letter.xml 31 It is our privilege to inform you about our new 32 database managed with XML. This 33 new system allows you to reduce the load on 34 your inventory list server by having the client machine 35 perform the work of sorting and filtering the data Please visit our Web site for availability 39 and pricing Sincerely Ms. Doe 45

10 Intermezzo 1 1. Finish this i2xml.py filter so it translates a list of Isequence objects into XML (following the above structure) and saves it in a file. Assume the list contains only one Isequence object. Use your module with this driver program and translate this Fasta file into XML. Load the resulting XML file into a browser. i2xml.pydriver program Fasta file 2.Change the XML structure defined by your filter so that TYPE is no longer a tag by itself but an attribute of the SEQ tag (see page 496). 3.Modify your i2xml filter so that it can now translate a list of several Isequence objects into one XML file, using the structure from part 2. Test your program with the same driver on this Fasta file.Fasta file All files found from the Example Programs page

11 solution from Isequence import Isequence import sys # Save a list of Isequences in XML class SaveToFiles: """Stores a list of ISequences in XML format""" def save_to_files(self, iseqlist, savefilename): try: savefile = open(savefilename, "w") print >> savefile, " " for seq in iseqlist: print >> savefile, ’ ’%seq.get_type() print >> savefile, " %s "%seq.get_name() print >> savefile, " %s "%seq.get_id() print >> savefile, " %s "%seq.get_sequence() print >> savefile, " " savefile.close() except IOError, message: sys.exit(message)

12 solution XML file loaded in Internet Explorer

13 Parsers and trees We’ve already seen that XML markup can be displayed as a tree Some XML parsers exploit this. They –parse the file –extract the data –return it organized in a tree data structure called a Document Object Model article titledateauthor summarycontent firstNamelastName

Document Object Model (DOM) DOM parser retrieves data from XML document Hierarchical tree structure called a DOM tree Each component of an XML document represented as a tree node Parent nodes contain child nodes Sibling nodes have same parent Single root (or document) node contains all other document nodes

15 DOM tree of previous example article title author summary contents lastName firstName date Fig. 15.6Tree structure for article.xml. one single document root node sibling nodes parent node child nodes Simple XML December 21, 2001 John Doe XML is pretty easy. In this chapter, we present a wide variety of examples that use XML.

16 Python provides a DOM parser! all nodes have name (of tag) and value text (incl. whitespace) represented in nodes with tag name #text Simple XML December 21, 2001 John Doe XML is pretty easy. In this chapter, we present a wide variety of examples that use XML. article title #text date author summary content #text firstName #text lastName #text Simple XML #text Dec #text XML.. easy. #text In this..XML. #text John #text Doe

17 import sys from xml.dom.minidom import parse # stuff we have to import from xml.parsers.expat import ExpatError # the book uses an old version.. > try: document = parse( file ) file.close() except ExpatError: sys.exit( "Error processing XML file" ) rootElement = document.documentElement print "Here is the root element of the document: %s" % rootElement.nodeName # traverse all child nodes of root element for node in rootElement.childNodes: print node.nodeName # get first child node of root element child = rootElement.firstChild print "\nThe first child of root element is:", child.nodeName print "whose next sibling is:", # get next sibling of first child sibling = child.nextSibling print sibling.nodeName print “Text inside “+ sibling.nodeName + “ tag is”, textnode = sibling.firstChild print textnode.nodeValue print "Parent node of %s is: %s" % ( sibling.nodeName, sibling.parentNode.nodeName ) Parse XML document and load data into variable document List of a node’s children get root element of the DOM tree, documentElement attribute refers to root node nodeName refers to element’s tag name Other node attributes: firstChild nextSibling nodeValue parentNode revised fig16_04. py

18 Program output Here is the root element of the document: article The following are its child elements: #text title #text date #text author #text summary #text content #text The first child of root element is: #text whose next sibling is: title Text inside "title" tag is Simple XML Parent node of title is: article.. print “Text inside “+ sibling.nodeName + “ tag is”, textnode = sibling.firstChild # print text value of sibling print textnode.nodeValue.. article title #text date author summary content #text firstName #text lastName #text Simple XML #text Dec #text XML.. easy. #text In this..XML. #text John #text Doe

19 Parsing XML sequence? We have i2xml filter – we want xml2i also Don’t have to write XML parser, Python provides one Thus, algorithm: –Open file –Use Python parser to obtain the DOM tree –Traverse tree to extract sequence information, build Isequence objects SEQUENCEDATA SEQ (type) DATA IDNAME SEQ (type) DATA IDNAME Ignoring whitespace nodes, we have to search a tree like this:

20 from Isequence import Isequence import sys from xml.dom.minidom import parse from xml.parsers.expat import ExpatError class Parser: """Parses xml file, stores sequences in Isequence list""" def __init__( self ): self.iseqlist = [] # make empty list def parse_file( self, loadfilename ): try: loadfile = open( loadfilename, "r“ ) except IOError, message: sys.exit( message ) # Use Python's own xml parser to parse xml file: try: dom = parse( loadfilename ) loadfile.close() except ExpatError: sys.exit( "Couldn't parse xml file“ ) # now dom is our dom tree structure. Was the xml file a sequence file? if dom.documentElement.nodeName == "SEQUENCEDATA“ : # recursively search the parse tree: for child in dom.documentElement.childNodes: self.traverse_dom_tree( child ) else: sys.exit( "This is not a sequence file" ) return self.iseqlist part 1:2

21 def traverse_dom_tree( self, node ): """Recursive method that traverses the DOM tree""" if node.nodeName == "SEQ“ : # marks the beginning of a new sequence self.iseq = Isequence() # make new Isequence object self.iseqlist.append( self.iseq ) # add to list newformat = 0 # the type should be an attribute of the SEQ tag. # go through all attributes of this node: for i in range( node.attributes.length ): if node.attributes.item(i).name == "type“ : # good, found a 'type' attribute newformat = 1 # get the value of the attribute, put it in the Isequence: self.iseq.set_type( node.getAttribute( "type" ) ) break if not newformat: # we didn't find any 'type' attribute, this is old format print "No 'type' attribute in element SEQ" # next recursively traverse the child nodes of this SEQ node: for child in node.childNodes: self.traverse_dom_tree( child ) elif node.nodeName == "NAME“ : self.iseq.set_name( node.firstChild.nodeValue ) elif node.nodeName == "ID“ : self.iseq.set_id( node.firstChild.nodeValue ) elif node.nodeName == "DATA“ : self.iseq.set_sequence( node.firstChild.nodeValue ) part 2:2 SEQ (type) DATA IDNAME

22 What if the XML sequence format changes? Now the name of the finder of the sequence is also stored as a new tag: SEQUENCEDATA SEQ (type) DATA ID FOUNDBY SEQ (type) DATA ID FOUNDBYNAME

23 Robustness of XML format Our xml2i filter still works: –Can’t extract the finder information: ignores the foundby node: –But: doesn’t crash! Still extracts other information –Easy to incorporate new info def traverse_dom_tree( self, node ): """Recursive method that traverses the DOM tree""" if node.nodeName == "SEQ“ :.. # next recursively traverse the child nodes of this SEQ node: for child in node.childNodes: self.traverse_dom_tree( child ) elif node.nodeName == "NAME“ : self.iseq.set_name( node.firstChild.nodeValue ) elif node.nodeName == "ID“ : self.iseq.set_id( node.firstChild.nodeValue ) elif node.nodeName == "DATA“ : self.iseq.set_sequence( node.firstChild.nodeValue ) SEQ (type) DATA ID FOUNDBYNAME

24 Compare with extending Fasta format Say that the Fasta format is modified so the finder appears in the second line after a >: >HSBGPG Human gene for bone gla protein (BGP) >BiRC CGAGACGGCGCGCGTCCCCTTCGGAGGCGCGGCGCTCTATTACGCGCGATCGACCC.. Our Fasta parser would go wrong: for line in lines: if line[0] == '>': # new sequence starts items = line.split() #put new Isequence obj. in list.. elif self.iseq: # we are currently building an iseq object, extend its sequence self.iseq.extend_sequence( line.strip() ) # skip trailing newline

25 XML robust So, the good thing about XML is that it is robust because of its well-defined structure Widely used, i.e. this overall tag structure won’t change Parsers available in Python already: –Read XML into a DOM tree –DOM tree can be traversed but also manipulated (see next slide) –Read XML using so-called SAX method

26 See all the methods and attributes of a DOM tree on pages 537ff Possible to manipulate the DOM tree using these methods: add new nodes, remove nodes, set attributes etc.

27 Remark: book uses old version of DOM parser XML examples in book won’t work (except the revised fig16.04) Look in the presented example programs to see what you have to import All the methods and attributes of a DOM tree on pages 537ff are the same

28 Intermezzo 2 1.Copy this file and take a look at it in your editor: /users/chili/CSS.E03/Intermezzi/data.xml Any idea what this data is? 2.Open the file in a browser. Expand and collapse nodes by clicking the - and + symbols. Do you see the structure of the tree? Any idea what the data represents now? 3.Copy this program to the same directory. Run it and find the name of Jakob's mother's father's mother. See how the program works?this 4.Modify the program so it reports the birth year of the current person as well as the name. 5.Enhance the program so the user can also go back to the son or daughter of the current person. See table on page If you have time: Enhance the program so it prints the current person's mother-in-law, if she exists.

29 solution name = person.getAttribute( "n" ) print( "%s" %name ) if name != 'Jakob‘ : print "%s's mother in law is“ %name, parentNode = person.parentNode # parentNode is either an 'm' or an 'f' node. If it is a mother # node, we need the father node, and vice versa: if parentNode.nextSibling: spouse = parentNode.nextSibling.firstChild else: spouse = parentNode.previousSibling.firstChild # Now we need the mother of the spouse: for childNode in spouse.childNodes: if childNode.nodeName == 'm‘ : print childNode.firstChild.getAttribute( 'n' ) break input = raw_input( "Report (m)other or (f)ather or (o)ffspring of %s? “ %name ) if input != 'm' and input != 'f' and input != 'o‘ : break if input == 'o‘ : print "\n" + name + "'s offspring is“, person = person.parentNode.parentNode else: for child in person.childNodes: if child.nodeName == input: if input == 'm‘ : print "\nMother of “ + name + " is“, elif input == 'f': print "\nFather of “ + name + " is“, person = child.firstChild break