A data retrieval workflow using NCBI E-Utils + Python John Pinney Tech talk Tue 12 th Nov.

Slides:



Advertisements
Similar presentations
REST Vs. SOAP.
Advertisements

COM vs. CORBA.
Wrapping up our last topic: You and your (DNA) parasites Events like these, happening over and over again, have led to… Edward Marcotte/Univ. of Texas/BCH391L/Spring.
Bookshelf.EXE - BX A dynamic version of Bookshelf –Automatic submission of algorithm implementations, data and benchmarks into database Distributed computing.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Automatic Information Retrieval from Bioinformatics Websites Kang Peng.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Integrating Complementary Tools with PopMedNet TM 27 July 2015 Rich Schaaf
RESTful Web Development With Nodejs and Express. REST Stands for REpresentational State Transfer Has the following constraints: ◦Client-Server ◦Stateless.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Designing and Implementing Web Data Services in Perl
A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov.
1 CMPT 275 High Level Design Phase Architecture. Janice Regan, Objectives of Design  The design phase takes the results of the requirements analysis.
REST.  REST is an acronym standing for Representational State Transfer  A software architecture style for building scalable web services  Typically,
Design Patterns Phil Smith 28 th November Design Patterns There are many ways to produce content via Servlets and JSPs Understanding the good, the.
WEB API: WHY THEY MATTER ECOL 453/ Nirav Merchant
SEMESTER PROJECT PRESENTATION CS 6030 – Bioinformatics Instructor Dr.Elise de Doncker Chandana Guduru Jason Eric Johnson.
USING PERL FOR CGI PROGRAMMING
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Python and REST Kevin Hibma. What is REST? Why REST? REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Introducing CoMI Aligned with RestCONF (draft-ietf-netconf-restconf-04) Common data modeling language (YANG defined in RFC 6020) Protocol (CoAP instead.
SE: CHAPTER 7 Writing The Program
1 Welcome to CSC 301 Web Programming Charles Frank.
Introduction to Web Services Eric Lease Morgan University Libraries of Notre Dame June 24, 2005.
Cohesion and Coupling CS 4311
REST - Introduction Based on material from InfoQ.com (Stefan Tilkov) And slides from MindTouch.com (Steve Bjorg) 1.
11/6/2013BCHB Edwards Using Web-Services: NCBI E-Utilities, online BLAST BCHB Lecture 19.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008.
Information Retrieval and Web Search Crawling in practice Instructor: Rada Mihalcea.
Web Technologies Lecture 3 Web forms. HTML5 forms A component of a webpage that has form controls – Text fields – Buttons – Checkboxes – Range controls.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Representational State Transfer (REST). What is REST? Network Architectural style Overview: –Resources are defined and addressed –Transmits domain-specific.
JS (Java Servlets). Internet evolution [1] The internet Internet started of as a static content dispersal and delivery mechanism, where files residing.
RESTful Web Services What is RESTful?
The Protein Identifier Cross-Reference (PICR) service.
Web Services An Introduction Copyright © Curt Hill.
 An essential supporting structure of any thing  A Software Framework  Has layered structure ▪ What kind of functions and how they interrelate  Has.
 Packages:  Scrapy, Beautiful Soup  Scrapy  Website  
Chapter 5: MULTIMEDIA DATABASE MANAGEMENT SYSTEM ARCHITECTURE BIT 3193 MULTIMEDIA DATABASE.
Chapter 16 Web Pages And CGI Scripts Department of Biomedical Informatics University of Pittsburgh School of Medicine
Using Web-Services: NCBI E-Utilities, online BLAST BCHB Lecture 19 By Edwards & Li Slides:
COMPUTER NETWORKS Hwajung Lee. Image Source:
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
/16 Final Project Report By Facializer Team Final Project Report Eagle, Leo, Bessie, Five, Evan Dan, Kyle, Ben, Caleb.
E-utilities: Short course. The Entrez Query System at NCBI.
Lecture 1: Introduction to Entrez October 16-19, 2007 NCBI PowerScripting.
Pyragen A PYTHON WRAPPER GENERATOR TO APPLICATION CORE LIBRARIES Fernando PEREIRA, Christian THEIS - HSE/RP EDMS tech note:
GeneConnect Use Cases and Design August 3, GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.
Wrapping up our last topic: You and your (DNA) parasites Events like these, happening over and over again, have led to…
ACCESS FROM C++ CODE TO DATA FROM VALIDATION DB Dmitri Konstantinov, CERN.
Data and tools on the Web have been exposed in a RESTful manner. Taverna provides a custom processor for accessing such services.
The Client-Server Model
Node.js Express Web Services
Using Web-Services: NCBI E-Utilities, online BLAST
Applied Cyber Infrastructure Concepts Fall 2017
Testing REST IPA using POSTMAN
WEB API.
Lecture 1: Multi-tier Architecture Overview
Python and REST Kevin Hibma.
Supporting High-Performance Data Processing on Flat-Files
Web-Services and RESTful APIs
Chengyu Sun California State University, Los Angeles
Presentation transcript:

A data retrieval workflow using NCBI E-Utils + Python John Pinney Tech talk Tue 12 th Nov

Task Produce a data set given particular constraints. Allow easy revision/updates as needed. Output some kind of report for a biologist.

(One possible) solution A number of DBs/tools now accept queries via RESTful* interfaces, in principle allowing up-to-date data set retrieval. fully online analysis workflows. *REST = Representational State Transfer. A client/server architecture that ensures stateless communication, usually implemented via HTTP requests.

Bioinformatics REST services NCBI E-utilsPubMed, other DBs, BLAST EBI web servicesvarious UniProtprotein sequences KEGGmetabolic network data OMIMhuman genetic disorders + many others (see e.g. biocatalogue.org for a registry)

E-Utils services ESummary EFetch ESearch ELink all available through

Basic URL API e.g. retrieve IDs of all human genes: + esearch.fcgi?retmode=xml&db=gene&term=9606[TAXID] esearch( which EUtil) retmode=xml( output format) db=gene( which DB) term=9606[TAXID]( query term)

My tasks 1. Produce a list of human genes that are associated with at least one resolved structure in PDB AND at least one genetic disorder in OMIM 2. Make an online table to display them

My tasks: 1. Produce a list of human genes that are associated with at least one resolved structure in PDB AND at least one genetic disorder in OMIM 2. Make an online table to display them

Easy: Python requests using PyCogent PyCogent is a Python bioinformatics module that includes convenience methods for interaction with a number of online resources. from cogent.db.ncbi import * ef = EFetch(id=' ', rettype='fasta') protein = ef.read()

Bit more typing but still easy: Python requests using urllib2 For services that are not available through PyCogent, you can construct your own URLs using urllib2. import urllib2 url = " esummary.fcgi?retmode=xml&db=gene&id=7157" result = urllib2.urlopen(url).read() (TIP: use urllib.quote_plus to escape spaces and other special characters when preparing your URL query).

Making your life much easier: XML handling using BeautifulSoup Using retmode=xml ensures consistency in output format, but it can be very difficult to extract the data without a proper XML parser. The simplest and most powerful XML handling in Python I have found is via the BeautifulSoup object model.

Making your life much easier: XML handling using BeautifulSoup Example: extract all structure IDs linked to gene e = ELink(db='structure', dbfrom='gene', id=7153) result = e.read()

Making your life much easier: XML handling using BeautifulSoup Example: extract all structure IDs linked to gene e = ELink(db='structure', dbfrom='gene', id=7153) result = e.read() from bs4 import BeautifulSoup soup = BeautifulSoup(result,'xml') linkset = soup.eLinkResult.LinkSet s = [ x.Id.text for x in linkset.LinkSetDb.findAll('Link') ]

Using WebEnv to chain requests If you specify usehistory='y', NCBI can remember your output result (e.g. a list of gene IDs) and use it as a batch input for another EUtil request. This is extremely useful for minimising the number of queries for workflows involving large sets of IDs. You keep track of this “environment” using the WebEnv and query_key fields.

Using WebEnv to chain requests def webenv_search(**kwargs): e = ESearch(usehistory='y',**kwargs) result = e.read() soup = BeautifulSoup(result,'xml') return {'WebEnv':soup.WebEnv.text, 'query_key':soup.QueryKey.text }

Workflow for gene list

My tasks 1. Produce a list of human genes that are associated with at least one resolved structure in PDB AND at least one genetic disorder in OMIM 2. Make an online table to display them (next time!) ✓

Summary Using NCBI EUtils to produce a data set under given constraints was relatively straightforward. Resulting code is highly re-usable for future workflows (especially if written as generic functions).

Python modules used PyCogent Simple request handling for the main EUtils. pycogent.org urllib2 General HTTP request handler. docs.python.org/2/library/urllib2.html BeautifulSoup Amazingly easy to use object model for XML/HTML.