Getting data out of XML These exercises provide an overview of how to use the native Taverna XPath services to get data out of XML
The Basics of XML XML – eXtensible Markup Language Designed for the storage and transport of data This includes passing data between services or retrieving data from a Web page Provides a machine readable dataset Many service providers export data in XML
Example Katy Paul Reminder Don't forget about Bonn Trip! The following website has lost of information about XML and tutorials:
Exploring XML Identify the root/top element in the example xml Find all the child elements What does each line end with ? If you get stuck, try exploring the W3Schools website for answers – the syntax page is especially good !!!
Workflows to retrieve XML
Load into Taverna the ‘Search Pubmed’ workflow, from the Bonn myExperiment group Run the workflow and see what output you get from Pubmed try “Blood Clotting” as a search term if you can’t think of anything Find the root and child elements in the xml See if you can find the list of Pubmed ids How many ids did you get for your search term? There should be a count of them somewhere Exploring XML
You should get something like this (with other elements too) Familiarise yourself with this data We’ll be extracting some of it next
Xpath and Getting the Data out Xpath used to navigate through elements of XML Used to find nodes, and data at those nodes ‘Expressions’ are used to navigate through the document Further details on what to use can be found at: More information at: Sample Expressions
Lets have an example Katy Paul Reminder Don't forget about Bonn Trip! To get ‘Katy’ from the XML ‘Katy’ is under the element Navigate through the XML, starting at element, and ending at element So the Xpath expression would be: /note/to
Xpath in Taverna Taverna has 2 modes of Xpath functionality ‘XML from Text’ local java service ‘Xpath Service’ Template The local java service is designed for people who know the Xpath query they want to use and are confident in writing XPath The Xpath Service Template is designed for a dynamic/exploratory retrieval of data…… and for those who are not confident writing XPath straight away To start with, we will use only the Xpath Service Template
Xpath using the Service Template
Install the Xpath Plugin To install the Xpath service template, you will have to update the Taverna Workbench Click on 'Advanced', then select 'Updates and Plugins' In the pop-up menu, click on the 'find new plugins' button Find the Xpath update, and click 'Install' You will need to restart Taverna for this to work correctly Don't forget to save any workflows you have open !!!
Getting the Data out using the template In Taverna, find the service template for XML data processing Drag the service template onto an empty workflow The configuration window should automatically open Copy and Paste the example xml (the Katy XML from previous slide) into the relevant section of the popup box If you haven’t got the data, you can get it from here: Press the green arrow to generate XML tree structure (on the right hand side)
Getting the Data out Paste here Press this
Getting the Data out You should be able to see the XML tree structure Explore it by clicking on the “+” arrows to open and close nodes Find the node and select it Note, it also selects the root node – making a path through the XML to the IdList node Click the ‘Generate Xpath Expression’ button You should see the Xpath, or path to XML element, given as: /note/to
Getting the Data out Xpath Expression Data from XML
Getting the Data out using the template In Taverna, find the service template for XML data processing Drag the service template onto the ‘Search Pubmed’ workflow The configuration window should automatically open Paste the xml from your results pane into the relevant section If you haven’t got the data, you can get it from here: Press the green arrow to generate XML tree structure (on the right hand side)
Getting the Data out What does /default:eSearchResult/default:IdList mean? It describes how to navigate through the XML, from the root element ‘eSearchResult’ to get the IdList element. ‘default’ represents the namespace for the elements, or a URI reference to where the data came from Click on the ‘Show XML Tree’ button, and select ‘Show namespaces of XML elements’ This should show you the URI from where the data came from When you have your Xpath query set up, click the apply button, close the popup window, and run the workflow Try getting something else back from the XML by manually editing the generated Xpath query
XML advanced Using the native java Xpath service
Advanced XPath Service Copy the XML from the results Remove the Xpath Service template from the workflow Locate the XPath service in the list of available services Drag it onto the ‘Search PubMed’ workflow
Advanced XPath Service Create an input for the service, called ‘xml_text’, and connect it to the port ‘xml-text’ Add another input port called ‘xpath_query’, and connect it to the ‘xpath’ port Connect up the nodelist port to an output, called ‘element_text’ Run the workflow, using “Blood Clotting” as your search term Enter an Xpath query that will retrieve – The TermSet counts for all terms in the TranslationStack – Re-write the Xpath to get the count only for the TermSet, whose term is: “Blood coagulation”[MeSH Terms] – Choose a data element of your own to get back from the XML