Take a REST from manual searching

Slides:



Advertisements
Similar presentations
WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. Web Services Course CBS, DK. EBI Web Services Teresa Miyar EMBL-EBI External Services.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Automatic Information Retrieval from Bioinformatics Websites Kang Peng.
Introduction to Web Database Processing
EBI is an Outstation of the European Molecular Biology Laboratory. Web Services Programmatic access to Life Sciences resources. Rodrigo Lopez.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Wrapping third- party analytical services for caBIG Taverna-caBIG project Stian Soiland-Reyes Alexandra Nenadic University of Manchester, UK
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
The Web Services Game. This game is intended for a non technical audience; We have purposely simplified technical aspect. 2.
MuPIT Interactive - Programmatic Link to MuPIT Exercise: Find a protein structure and its amino residue where the genomic location chr17: maps to,
SEMESTER PROJECT PRESENTATION CS 6030 – Bioinformatics Instructor Dr.Elise de Doncker Chandana Guduru Jason Eric Johnson.
EMBRACE Web Services Taavi Hupponen CSC – Center for Scientific Computing, Finland BOSC 2007.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
Web Services interoperability and standards. Infrastructure Challenge ● Applied bioinformatics need various computer resources ● The amount and size of.
Nadir Saghar, Tony Pan, Ashish Sharma REST for Data Services.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
Moby Web Services Iván Párraga García MSc on Bioinformatics for Health Sciences May 2006.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
WWW: an Internet application Bill Chu. © Bei-Tseng Chu Aug 2000 WWW Web and HTTP WWW web is an interconnected information servers each server maintains.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
The Protein Identifier Cross-Reference (PICR) service.
Copyright OpenHelix. No use or reproduction without express written consent1.
EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob Henning Hermjakob
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
National College of Science & Information Technology.
Data and tools on the Web have been exposed in a RESTful manner. Taverna provides a custom processor for accessing such services.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Architecture Review 10/11/2004
Integrating ArcSight with Enterprise Ticketing Systems
Take a REST from manual searching: PDBe, programmatically
The Client-Server Model
ELIXIR Core Data Resources and Deposition Databases
EMBL-EBI, programmatically - take a REST from manual searching: Sequence analysis tools Web Production Team Anna Foix Joon Lee.
CyVerse Discovery Environment
Data Bridge Solving diverse data access in scientific applications
Programmatic access to EMBL-EBI resources
The EBI Search RESTful API
PHP / MySQL Introduction
CISC103 Web Development Basics: Web site:
WEB API.
Mangaldai College, Mangaldai
Florian Gräf Software Developer of the McEntyre group at EMBL-EBI
Web services, WSDL, SOAP and UDDI
Taverna Tutorial exercise 2: REST services from BioCatalogue
Lesson 3 Bioinformatics Laboratory
Aleksandra Pawlik materials by Katy Wolstencroft
Python and REST Kevin Hibma.
REST Services Data and tools on the Web have been exposed in both WSDL and REST. Taverna provides a custom processor for accessing REST services Peter.
Lab 2: Information Retrieval
Welcome - webinar instructions
An Introduction to Designing and Executing Workflows with Taverna
Web Application Development Using PHP
Presentation transcript:

Take a REST from manual searching Programmatic access to Tools and Databases at EMBL-EBI Andrew Cowley andrew.cowley@ebi.ac.uk support@ebi.ac.uk

Contents What are we trying to do? Why consider programmatic access? How web services work How do you use web services (in practice) Clients and demo Tips and pitfalls Where to get help

What are we trying to do? Bioinformatics! Science of storing, retrieving and analysing large amounts of biological information From molecules to medicine Interpreting human variation Smarter farming etc.

Data resources at EMBL-EBI 17/10/2017 Data resources at EMBL-EBI Genes, genomes & variation European Nucleotide Archive European Variation Archive European Genome-phenome Archive Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Gene, protein & metabolite expression Systems RNA Central Functional Genomics PRIDE Metabolights BioModels BioSamples Enzyme Portal IntAct Reactome Protein sequences, families & motifs InterPro Pfam UniProt The slide shows the core resources at the EBI to show the range of data you can access through the EBI. Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank Literature & ontologies Chemical biology Europe PubMed Central BioStudies Gene Ontology Experimental Factor Ontology ChEMBL SureChEMBL ChEBI

Why consider programmatic access?

Browser interface Easy interaction Visual input and results interfaces Access to the latest versions of softwares and data

Browser interface - disadvantages Can only run one task at a time Repetitive analysis is tedious Limited workflow capabilities

Local install Download tools and data locally Run as many tasks as you have power for Easy integration into workflows

Local install - disadvantages Expertise/privileges might be needed Local compute and storage requirements How do you keep up to date?

Another approach – programmatic access Interface with EMBL-EBI servers programmatically Use our compute, plus latest data Run many tasks simultaneously Easy to integrate in workflow/own website/frontend

Programmatic access - disadvantages Not unlimited access (though still more access than any other method) Little bit of programmatic knowledge still needed Still using our data

How does it work?

How does it work?

How does it work?

What are web services? Web service = the server interface that responds to a defined request-response message system

Message systems Two main types of message systems SOAP REST Both (can) use HTTP as the protocol

SOAP Simple Object Access Protocol Wraps requests and responses in XML envelopes Definitions (eg parameters) described by a WSDL (Web Service Definition Language) for each service Historically popular, but now more commonly superseded by..

REST REpresentational State Transfer 17/10/2017 REST REpresentational State Transfer In most cases just uses URLs and HTTP verbs (GET, POST, PUT and DELETE) Lighter weight, quicker to process REST is actually a style, not a protocol, so services are described as RESTful, rather than implementing REST Some developers provide web services over RESTful-like interfaces, which don’t conform to the full style guidelines

REST examples EBI Search – search for entries in our complete collection https://www.ebi.ac.uk/ebisearch/ws/rest/?query=globin

REST examples dbfetch – retrieve entries across our collection https://www.ebi.ac.uk/Tools/dbfetch/dbfetch/uniprotkb/WAP_RAT

Synchronous web services

But if the task takes too long..

Asynchronous web services 17/10/2017 Asynchronous web services JobID “Is JobID done?” Called ‘polling’ “Still cooking”

Asynchronous web services “Is JobID done?” “It’s ready!” Retrieve JobID

How do I know which parameters to use? Look at documentation Use WSDL (for SOAP) But for REST? Might be a WADL – Web Application Description Language Query parameters programmatically

Querying parameters Many web services return details of parameters when queried https://www.ebi.ac.uk/Tools/services/rest/ncbiblast/parameters/

Querying parameters https://www.ebi.ac.uk/Tools/services/rest/ncbiblast/parameterdetails/program /

Querying parameters These can be built into a website that uses the Swagger framework Creates interactive documentation

EBI Search swagger https://www.ebi.ac.uk/ebisearch/swagger.ebi

EBI Search swagger https://www.ebi.ac.uk/ebisearch/swagger.ebi

Results Most web services also give a choice of results Different formats eg. Raw text, XML Images, identifiers etc.

Steps... Input parameters Meta-Information List parameters Get parameter details → Name, description, values... Submission Run (Email, title, values...) → Job Identifier Check status → RUNNING, FINISHED, ERROR... Results analysis List results available → Name, description, media type... Get result → Output, text, binaries (images)... Input parameters Job identifier (e.g. iprscan-S20110708-094729-0726-35857540-pg)

How do you use them in practice?

How do you use them in practice? Many ways to use them Generally incorporate the calls into a program or script Can be run from command line, or called from your own website, application or workflow Many EMBL-EBI services have example clients – can be used as a guide, or even by themselves

Clients Available for a range of programming languages Python, PERL, Java etc. Freely available to download, modify etc.

Sequence analysis tools web services Documentation and clients available at: www.ebi.ac.uk/Tools/webservices

Demo BLAST search

Database selection Sequence type Sequence input Program

Results

Results Job Identifier Results tabs

Results – Visual Output Shows where the alignment is occurring in the sequences

Results – Functional Predictions Domain and families predictions from InterPro

Now with web services.. Download client from www.ebi.ac.uk/Tools/webservices Run without arguments to check usage/help Carry out search Email address Database (uniprotkb_swissprot) Sequence type/stype (protein) Program (blastp) Input sequence

Web Services workflow $ncbiblast_lwp.pl --email email@example.org --program blastp --database uniprotkb_human --stype protein P01174.fasta $wsdbfetch_soaplite.pl fetchData @<jobid>.ids.txt fasta > P01174search2016_05_15.fasta $kalign_soaplite.pl --email email@example.org P01174search2016_05_15.fasta

Web Services workflow $ncbiblast_lwp.pl --email email@example.org --program blastp --database uniprotkb_human --stype protein P01174.fasta | wsdbfetch_soaplite.pl fetchData @- fasta | kalign_soaplite.pl --email email@example.org -

Workflows Galaxy Taverna

Tips and pitfalls

Parallelising and batch running Don’t go mad! Check terms of use Sequence analysis tools generally have limit of 30 simultaneous jobs Going beyond this can affect your job speed and if disruptive you may be blocked If using a third party tool, check if it’s already submitting multiples – eg BLAST2GO or runIPRscan

Parallelising and batch running Use script to automate new job submissions, iterating input files When status FINISHED, submit new job. Use short cuts in clients to make things easier Eg --multifasta flag - allows input to be a file containing multiple fasta format sequences, client will work through sequentially Split input into 30 files, launch 30 --multifasta jobs

Bring back only the results you need By default all outputs are sent back, including graphics You can reduce space/transfer by just returning output of interest Eg ID list, BLAST report etc. --polljob --outformat ids --jobid <jobID> Check what result types are available --resultTypes --jobid <jobID>

Error messages – three levels Client May return error on invalid/missing parameters etc. Web service/server May return error if problem with connectivity/bad http Failed validation Tool May return error if problem running/completing the job

Common errors NOT_FOUND ERROR/FAILURE Check jobID Check submission date – results only stored 7 days ERROR/FAILURE Check parameters/input – validation might have failed FINISHED, but results include .error.txt file Check error message Check input – correct format, too large? Check tool usage – right tool for the task?

Where to get help

Getting help Documentation available via Help & Documentation link Helpdesk available via Feedback button or Current Protocols in Bioinformatics: Unit 3.12 Using EMBL-EBI Services via Web Interface and Programmatically via Web Services www.ebi.ac.uk/support/

Citing use Programmatic access to bioinformatics tools from EMBL-EBI update: 2017 Chojnacki S, Cowley A, Lee J, Foix A, Lopez R. Nucleic Acids Res Web Server issue (2017) DOI: 10.1093/nar/gkx273

Thank you! Support: www.ebi.ac.uk/support/ DOI: 10.1093/nar/gkx273 DOI: 10.1002/0471250953.bi0312s48 DOI: 10.1093/nar/gkt376 Andrew Cowley andrew.cowley@ebi.ac.uk support@ebi.ac.uk