EMBOSS, MyGrid and EMBRACE European Molecular Biology Open Software Suite Taverna workbench and workflows Web Services Peter Rice pmr@ebi.ac.uk
Who do we serve? Expert software developers Expert users Bioinformaticians Computer scientists Expert users Biology research community Industry Scientific users 22.05.2018 China-UK Data Transmission
What do we serve? Sequence analysis tools Data resources Workflows Open source Comprehensive package Data resources Public sequence database resources Locally installed data Users’ own datasets Workflows Taverna workbench Web services Standard SOAP web services Web service registry 22.05.2018 China-UK Data Transmission
EMBOSS: A quick introduction European Molecular Biology Open Software Suite Open source package for sequence analysis ANSI C source code GPL licensed applications, LGPL libraries 200+ applications 100+ third party applications in 15 associated packages Project started 1996 at Sanger Centre and HGMP Now based at EBI Release 6.1.0 15th July 2009 Funded by UK-BBSRC and EMBL-EBI 22.05.2018 China-UK Data Transmission
EMBOSS World Wide We have users in every continent - and a picture to prove it. This is British Antarctica. We are promised another photo from the frozen North The first EMBOSS course was in Beijing, April 1999. The wEMBOSS interface is from Canada, Argentina and Belgium 22.05.2018 China-UK Data Transmission
EMBOSS command line interface EMBOSS applications run from the command line This is not the only interface There are over 100 interfaces and packaged systems available Web interfaces Graphical user interfaces (GUIs) Web services All applications have a command definition file (.acd) Defines all inputs, outputs, and other options Read at startup Contains all command line options with descriptions Template for any other interface 22.05.2018 China-UK Data Transmission
EMBOSS command line example % antigenic Input protein sequence(s): uniprot:actb1_fugru Minimum length of antigenic region [6]: Output report [actb1_fugru.antigenic]: % antigenic uniprot:actb1_fugru -auto 22.05.2018 China-UK Data Transmission
EMBOSS ACD File integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" ] endsection: required section: output [ information: "Output section" type: "page" report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" endsection: output application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs" ] section: input [ information: "Input section" type: "page" seqall: sequence [ parameter: "Y" type: "PureProtein" endsection: input section: required [ information: "Required section" 22.05.2018 China-UK Data Transmission
EMBOSS makes things easy ACD files define sequence input Sequence type for DNA/protein, possible ambiguity codes, gaps Sequences in files 40+ formats supported - auto detection Sequence databases Remote servers - SRS, Entrez, MRS, URL Locally indexed - using the original data files Local script utilities Sequence output 40+ formats supported : sequence and features DAS support (Distributed Annotation Servers) 22.05.2018 China-UK Data Transmission
Example Dasty screen: Protein annotation 22.05.2018 China-UK Data Transmission
Example Ensembl: DNA annotation 22.05.2018 China-UK Data Transmission
EMBOSS Future plans Three open source books: users, developers, admin Cambridge University Press Original text can be freely reused New areas of interest Metadata and ontologies (EDAM, taxonomy, GO, SO, …) (all) public data resources Coordinate systems (ensembl, gene/protein input/results) Project-based working Next-generation sequence data – used by ordinary biologists 100+ new applications Database index updates Scientific advisory board Developer / User courses courses: anywhere, any time 22.05.2018 China-UK Data Transmission
Taverna workbench 22.05.2018 China-UK Data Transmission
Taverna Workbench MyGrid UK e-Science project Taverna 2.0 Workbench for bioinformatics data and tools services Open source Integrates SOAP web services Workflows can be saved, and exchanged by email Data passed to service, results returned Complete record available Taverna 2.0 Designed for scalability New workflow model Multiple servers Data passed by reference 22.05.2018 China-UK Data Transmission
EMBRACE Web Services EMBRACE EC Network of Excellence 18 partners Application interface standards for data content: DNA and protein sequence data Structure and image data Gene and protein expression Literature and text mining Analysis tools using data content standards Sequence analysis tools (EMBOSS etc.) Structure analysis tools ... and tools for other data types Taverna as an example user interface 22.05.2018 China-UK Data Transmission
EMBRACE Registry 22.05.2018 China-UK Data Transmission
EMBRACE Registry Registry of EMBRACE Web Services BioCatalogue Requires standard web service definitions Test suites defined by service providers Simple report of service availability Standard annotation Requires an ontology of terms for datatypes and methods BioCatalogue Manchester/EBI joint project EMBRACE Registry is a prototype Sharing a common schema BioCatalogue will take over when EMBRACE ends in 2010. 22.05.2018 China-UK Data Transmission
What do we serve? Sequence analysis tools Data resources Workflows Open source Comprehensive package Data resources Public sequence database resources Locally installed data Users’ own datasets Workflows Taverna workbench Web services Standard SOAP web services Web service registry 22.05.2018 China-UK Data Transmission
Acknowledgements EBI: Peter Rice, Alan Bleasby, Jon Ison, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux... IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, LION bioscience, SciTegic, Accelrys, Cambridge University Press Open-Bio Foundation, Sourceforge ... And the British Antarctic Survey http://emboss.sourceforge.net http://emboss.open-bio.org/wiki 22.05.2018 China-UK Data Transmission