Software for the Data-Driven Researcher of the Future Dr. Paul Fisher

Slides:



Advertisements
Similar presentations
David De Roure Social Networking and Workflows in Research.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
A portal interface to my Grid workflow technology Stefan Rennick Egglestone University of Nottingham
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Towards an understanding of Genotype-Phenotype correlations Paul Fisher et al.,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
Analysing African and European cattle with Taverna 2.2 Stuart Owen Based on the work by : Professor Andy Brass and Mohammad Khodadadi.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
THE BIOVEL PROJECT: ROBUST PHYLOGENETIC WORKFLOWS RUNNING ON THE GRID Bachir Balech (IBBE-CNR)
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Taverna Workbench – Case studies Helen Hulme. Do you really need to use workflows? Bioinformaticians are programmers Can use shell scripts Are used to.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
The Taverna Software Suite Prof Carole Goble FREng FBCS CITP The University of Manchester, UK
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Development of an interactive pipeline for Genome wide association analysis Falola Damilare & Adigun Taiwo – Covenant University Bioinformatics research.
CyVerse Discovery Environment
Professor Carole Goble University of Manchester, UK
A portal interface to myGrid workflow technology
Code Analysis, Repository and Modelling for e-Neuroscience
Taverna workflow management system
An Introduction to Designing, Executing and Sharing Workflows with Taverna and myExperiment Katy Wolstencroft University of Manchester.
Shim (Helper) Services and Beanshell Services
Code Analysis, Repository and Modelling for e-Neuroscience
Presentation transcript:

Software for the Data-Driven Researcher of the Future Dr. Paul Fisher

What is myGrid? An e-Science Collaboration Since 2001 Numerous partners involved: –Manchester –Southampton –Oxford –EMBL-EBI It provides sustainable and production quality software –Supported by OMII-UK, EPSRC and BBSRC Mixture of developers, bioinformaticians and researchers Software | Services | Content | Skills | Community

myGrid Open Suite of Tools Client User Interfaces Workflow GUI Workbench and 3 rd party plug-ins Workflow Repository Service Catalogue Programming and APIs Web Portals Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access, and Programming APIs

Huge amounts of data 100+ Genes QTL regions Microarray Genes How do I look at ALL the genes systematically? Next Gen Sequencing 10,000+ Genes

Issues with current approaches Scale of analysis task overwhelms researchers – lots of data User bias and premature filtering of datasets – cherry picking Hypothesis-Driven approach to data analysis Constant changes in data - problems with re-analysis of data Implicit methodologies (hyper-linking through web pages) Error proliferation from any of the listed issues – notably human error Solution Automate

Web Services –Technology and standard for exposing code and data resources by an means that can be consumed by a third party remotely –Describes how to interact with it, e.g. service parameters Workflows –General technique for describing and executing a process –Describes what you want to do, including the services to use

What kind of Services? WSDL Web Services REST BioMart R-processor BioMoby SoapLab Grid Services Local Java services Beanshell Workflows

Who Provides the Services? Open domain services and resources Taverna accesses services (11,874 operations) Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Can include your own services and resources too !!!

Where can I find these services?

A public centralised and curated registry of Life Science Web Services ‘Web 2.0’-style website and API Allow anyone to register, discover and curate Web Services Community oriented with expert guidance Open content, open source, open platform

Available services Workflow diagram Workflow Explorer

What are Workflows used for?

Taverna Taverna first released 2004 Current version Taverna 2.2 Currently users per month, 350+ organizations, ~40 countries, 80,000+ downloads across versions Freely available, open source LGPL Windows, Mac OS, and Linux User and developer workshops Documentation Public Mailing list and direct support

Trypanosomiasis in Africa Andy Brass Steve Kemp + many Others

Reuse, Recycle, Repurpose Workflows Dr Paul Fisher Dr Jo Pennock Identify biological pathways implicated in resistance to Trypanosomiasis in cattle using mouse as a model organism. Identify the biological pathways colitis and helminth infections in the mouse model DOI: /ibd | PMID:

Where can I find workflows?

Recycling, Reuse, Repurposing Share Search Re-use Re-purpose Execute Communicate Record

Bringing myExperiment to the Taverna user Taverna Plug-in

Take a breath….. myGrid Taverna –Workflows good for automation –Reduce errors BioCatalogue –Publicly curated repository of Web Services myExperiment –Web 2.0 repository supporting Workflow discovery and re-use

Taverna and the ‘Cloud’ Analysing Next Generation Sequencing Data +

Analysing African Cattle with Taverna 2.2 Different breeds of African Cattle 10,000 years separation African Livestock adaptations: More productive Increases disease resistance Potential outcomes: Food security Understanding resistance Understanding environmental Understanding diversity

The study Lots of sites involved in Study: –Univeristy of Liverpool –University of Manchester –ILRI (Nairobi)…… Genetic variation in cattle species –African breeds: N’dama, Boran and Sahiwal Resistance to African trypanosomiasis infection ( sleeping sickness ) –Genetic differences to make one species more resistant? –Potential consequences of those genetic differences? –Pathways are affected by those changes?

The Analysis Problem Sequenced DNA from 3 cattle breeds using SOLiD / Illumina 22 million SNPs for Sahiwal alone –N’Dama, Boran ~ 11 millions SNPs each –Large data Comparing new data with reference genomes Identifying interesting differences –e.g. non-synonymous SNPs, stop lost, stop gained, splicing regions etc

The Analysis Pipeline (in Perl) MAP FILTER ANALYSIS Input SNP data from sequencer Map between Genome Builds (Liftover) Filter for SNPs in Exons SNP consequences Identifying damaging SNPs (Polyphen) Harry Noyes – University of Liverpool

Workflow and phases Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “dangerous” SNP’s The result can be either a MySQL database or TSV / CSV download MSc Student - Mohammad Khodadadi

Taverna and the ‘Cloud’ +

What we will demonstrate 1.Uploading Next Generation Sequencing SNP data to the cloud 2.Creating a new experiment 3.Running a workflow on multiple cloud instances 4.Showing result output, including links to annotated SNPs

Demo

Managing and Processing Data

Accessing Taverna on the Cloud

Jobs Status Input Provenance Experiment Metadata Input data summary Loading inputs

Summary of Workflow Output Non-synonymous coding SNPs Polyphen predictions: probably damaging 11 Million SNP for N’ Dama N.B. Number variances due to workflow and polyphen filtering process

New Developments in myGrid

Essential for cloud Taverna Taverna 2.2 execution engine –Large data processing –Pause, resume and cancelling workflows –Retry and parallelisation layer Taverna 2.2 server –Remote workflow execution –Workflows launched from web pages –Workflows executed on the cloud

Other New features Validation reporting Loading and sharing service sets Support for offline editing New provenance features

ISMB 10 BioCatalogue Plug-in

Training Tutorials and Training –58+ tutorials to >900 people. –>20 universities, Life Science Institutes, and networks. –Major Bio conferences –Summer schools in Biology and Middleware Developer and User Days –Annotation Jamborees Undergraduate and Postgraduate Bioinformatics in > 30 universities.

More Information myGrid – Taverna – myExperiment – BioCatalogue –

Visit us at the myGrid Silver Sponsor Stand

FIN