1 Parsing XML sequence? We have i2xml filter (exercise) – we want xml2i also Don’t have to write XML parser, Python provides one Thus, algorithm: – Open.

Slides:



Advertisements
Similar presentations
1/7 ITApplications XML Module Session 8: Introduction to Programming with XML.
Advertisements

XSLT 11-Apr-17.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
XSL eXtensible Stylesheet Language. What is XSL? XSL is a language that allows one to describe a browser how to process an XML file. XSL can convert an.
XSL XSLT and XPath 11-Apr-17.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.
1 CP3024 Lecture 9 XML revisited, XSL, XSLT, XPath, XSL Formatting Objects.
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol.
Review Writing XML  Style  Common errors 1XML Technologies David Raponi.
1 Sequence formats >FOSB_MOUSE Protein fosB. 338 bp MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTA ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSYSTPGLSAYSTGGASGS.
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol.
1 Sequence formats >FOSB_MOUSE Protein fosB. 338 bp MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTA ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSYSTPGLSAYSTGGASGS.
1 Drawing phylogenies We've seen several tree data structures but we still can't draw a tree  In the tree drawing exercise we write a drawtree function.
1 Extensible Markup Language: XML XML developed by World Wide Consortium’s (W3C’s) XML Working Group (1996) XML portable, widely supported technology for.
1 Extensible Markup Language: XML HTML: widely supported protocol for formatting data XML: widely supported protocol for describing data XML is quickly.
XML Session 3 Data Islands Document Object Model ParsingNamespaces.
CS 898N – Advanced World Wide Web Technologies Lecture 22: Applying XML Chin-Chih Chang
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Word Processing. ► This is using a computer for:  Writing  EditingTEXT  Printing  Used to write letters, books, memos and produce posters etc.  A.
Chapter 13 XML Concept of XML Simple Example of XML XML vs. HTML in Syntax XML Structure DTD and CDATA Sections Concept of SAX Processing Download and.
BioPerl. cpan Open a terminal and type /bin/su - start "cpan", accept all defaults install Bio::Graphics.
By: Shawn Li. OUTLINE XML Definition HTML vs. XML Advantage of XML Facts Utilization SAX Definition DOM Definition History Comparison between SAX and.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
INTRODUCTION TO CLIENT-SIDE WEB PROGRAMMING ACM 511 ACM 262 Course Notes.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
WPM What’s behind the icon? Work Programme Management.
What is XML (Extensible Markup Language)? XML is basically a better comma delimited file. Example: Your client asks you to write a new reporting system.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
From Code to XLIFF Bridging the Chasm Dr. Stephen Flinter Connect Global Solutions LRC Conference – 19 November 2003.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Intro to XML Originally Presented by Clifford Lemoine Modified by Box.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Openadaptor XML Support Using openadaptor for XML processing Oleg Dulin,
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Javascript II DOM & JSON. In an effort to create increasingly interactive experiences on the web, programmers wanted access to the functionality of browsers.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
XML Refresher Course Bálint Joó School of Physics University of Edinburgh May 02, 2003.
MSI Information using XML, XSLT, & CVS Kakapo Meeting August 28, 2003.
XML Study-Session: Part III
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Advanced DOM Builds on last presentation on DOM Allows you to dynamically create elements and position them on a page DOM methods/properties are W3C standard.
Information Retrieval and Web Search Crawling in practice Instructor: Rada Mihalcea.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
 defined as Extensible Markup Language (XML) is a set of rules for encoding documents  Defines structure and data.
XML and Object Serialization. Structure of an XML Document Header Root Element Start Tags / End Tags Element Contents – Child Elements – Text – Both (mixed.
Python Documentation Projects Developers Day 8th International Python Conference 27 January 2000.
Copyright OpenHelix. No use or reproduction without express written consent1.
Important modules: Biopython, SQL & COM. Information sources  python.org  tutor list (for beginners), the Python Package index, on-line help, tutorials,
1 The tree data structure Outline In this topic, we will cover: –Definition of a tree data structure and its components –Concepts of: Root, internal, and.
XML DOM Week 11 Web site:
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Biopython 1. What is Biopython? tools for computational molecular biology to program in python and want to make it as easy as possible to use python for.
XML & JSON. Background XML and JSON are to standard, textual data formats for representing arbitrary data – XML stands for “eXtensible Markup Language”
Itcldoc 1.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Introduction to the Document Object Model
(optional - but then again, all of these are optional)
Intro to XML.
(optional - but then again, all of these are optional)‏
Week 11 Web site: XML DOM Week 11 Web site:
(b) Tree representation
More Sample XML By Sadia Anjum.
XML Problems and Solutions
Python and XML Styling and other issues XML
Supporting High-Performance Data Processing on Flat-Files
XML and Web Services (II/2546)
Presentation transcript:

1 Parsing XML sequence? We have i2xml filter (exercise) – we want xml2i also Don’t have to write XML parser, Python provides one Thus, algorithm: – Open file – Use Python parser to obtain the DOM tree – Traverse tree to extract sequence information, build Isequence objects SEQUENCEDATA SEQ (type) DATA IDNAME SEQ (type) DATA IDNAME Ignoring whitespace nodes, we have to search a tree like this:

We're still being systematic: Usual name for parse method Obtain a parse tree with the xml data for free xml2i.py (part 1) SEQUENCEDATA SEQ (type) Convert this SEQ subtree to an Isequence object

xml2i.py (part 2) SEQ (type) DATA IDNAME Way of getting to all attributes of a node Way of getting to a specific named attribute Recall: text kept in a #text node underneath #text..

4 What if the XML sequence format changes? Now the name of the finder of the sequence is stored as a new tag: SEQUENCEDATA SEQ (type) DATA ID FOUNDBY SEQ (type) DATA ID FOUNDBYNAME

5 Robustness of XML format Our xml2i filter still works because the DOM parser still works – Can’t extract the finder information: ignores the foundby node: – But: doesn’t crash! Still extracts other information – Easy to update filter to incorporate new info SEQ (type) DATA ID FOUNDBYNAME

6 Compare with extending Fasta format Say that the Fasta format is modified so the finder appears in the second line after a >: >HSBGPG Human gene for bone gla protein (BGP) >BiRC CGAGACGGCGCGCGTCCCCTTCGGAGGCGCGGCGCTCTATTACGCGCGATCGACCC.. Our Fasta parser would go wrong!

7 XML robust So, the good thing about XML is that it is robust because of its well-defined structure Widely used, i.e. this overall tag structure won’t change and other applications can read your XML data Parser available in Python already: – Read XML into a DOM tree – DOM tree can be traversed but also manipulated (see next slide)

8 See all the methods and attributes of a DOM tree on pages 537ff Possible to manipulate the DOM tree using these methods (add new nodes, remove nodes, set attributes etc.)

9 Convert old format XML sequence to new format SEQUENCEDATA TYPE SEQ DATA IDNAME Old format: sequence type has its own tag TYPE SEQUENCEDATA SEQ (type) DATA IDNAME New format: sequence type is attribute of SEQ tag

old_xml2i.py Add new method to original xml2i.py and call it after parsing the XML file

old_xml2phylip.py Import new module Check that type information is saved in the Isequence (not used in phylip format)

12 Testing on old format XML sequence dna Aspergillus awamori U03518 aacctgcggaaggatcattaccgagtgcgggtcctttgggcccaacctcccatc cgtgtctattgtaccctgttgcttcggcgggcccgccgcttgtcggccgccgggggggcgcctctgcc ccccgggcccgtgcccgccggagaccccaacacgaacactgtctgaaagcgtgcagtctgagttgatt gaatgcaatcagttaaaactttcaacaatggatctcttggttccggc U03518b.xml python old_xml2phylip.py U03518b.xml U03518b sequence is of type dna

13 Remark: book uses old version of DOM parser XML examples in book won’t work (except the revised fig16.04) Look in the presented example programs to see what you have to import All the methods and attributes of a DOM tree on pages 537ff are the same