Katy Wolstencroft University of Manchester

Slides:



Advertisements
Similar presentations
Delivering User Needs: A middleware perspective Steven Newhouse Director.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Tom Oinn,  Download Taverna from  Windows or linux If you are using either.
Taverna and my Grid Basic overview and Introduction Tom Oinn
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
Standards and Ontologies to Enable Discovery Data and Information Integration Robin McEntire GlaxoSmithKline 19 Nov, 2002.
Introduction to Taverna, an environment For designing and executing workflows Franck Tanoh University of Manchester.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Taverna Workbench Stuart Owen University of Mancester, UK
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Designing and Sharing Taverna Workflows: Exploring Taverna 2.1 Beta
A myGrid Project Tutorial
Taverna workflow management system
Shim (Helper) Services and Beanshell Services
Scientific Workflows Lecture 15
Presentation transcript:

Katy Wolstencroft University of Manchester myGrid Katy Wolstencroft University of Manchester

Background myGrid middleware components to support in silico experiments in biology Originally designed to support bioinformatics chemoinformatics health informatics medical imaging integrative biology

History EPSRC funded UK eScience Program Pilot Project NOW – OMII uk node

myGrid in OMII-UK myGrid OMII Stack OGSA-DAI March 2006 middleware to assist with access and integration of data from separate sources via the grid. OGSA-DAI March 2006

Virtual Grid of Resources Biology knowledge-rich Applying prior knowledge to new data myGrid middleware to enable interoperation between distributed data and resources – a grid of data – not a grid of resources

Lots of Resources NAR 2005 – over 700 databases Well over 200 from there. 139 different database for biopathways alone. Lincoln Stein describes it a as a bionation – like italy in the 19th Century.

Global in silico biological research The User Community Bioinformatics is an open Community Open access to data Open access to resources Open access to tools Open access to applications Global in silico biological research

The User Community Problems Everything is Distributed Data, Resources and Scientists Heterogeneous data Very few standards I/O formats, data representation, annotation Everything is a string! Integration of data and interoperability of resources is difficult

Data intensive bioinformatics Some are documents Some are images Bioinformatics analyses typically involve visiting many data resources and analytical tools Data intensive bioinformatics Some are documents Some are images Some are people’s web pages ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

myGrid Approach - Workflows General technique for describing and enacting a process describes what you want to do, not how you want to do it Simple language specifies how bioinformatics processes fit together – processes are web services High level workflow diagram separated from any lower level coding – therefore, you don’t have to be a coder to build workflows Predicted Genes out Sequence Repeat Masker Web service GenScan Web Service Blast Web Service

Scufl + Workflow Object Model Taverna Workbench Application data flow layer Scufl graph + service introspection Scufl + Workflow Object Model Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Workflow Execution Freefluo Workflow enactor Three tiered abstraction Processor invocation layer Processor Processor Processor Processor Processor Processor Processor Bio MART Seq Hound Plain Web Service Soap lab Bio MOBY Local App Enactor

Taverna Workflow Components Freefluo Freefluo Workflow engine to run workflows SOAPLAB Web Service Any Application e.g. DDBJ BLAST Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available

So many services – semantic discovery Over 3000 services SeqHound – Database of biological sequences and tools BioMart – Federated query system EMBOSS – Sequence analysis tools BioMoby – Collection of web services EBI SOAPLAB – Collection of supported services

What Services we Support shims EMBOSS shims Originally plain vanilla wsdl Collaboration with biomoby Provision for ‘legacy’ wrapping Providing whole workflows as ‘extra’ services Local java – local beanshell scripting Biomart JDBC High throughput data myIB, MIAS-Grid Large-scale genomics Jumbo - chemoinformatics

What shall I do when a service fails? Most services are owned by other people No control over service failure Some are research level Workflows are only as good as the services they connect! To help - Taverna can: Notify failures Instigate retries Set criticality Substitute services Instigate checkpoints for long-running workflows (myIB)

Data Management Workflows can generate vast amount of data - how can we manage and track it? Data AND metadata AND experiment provenance LSIDs - to identify objects Semantic Web technologies (RDF, Ontologies) To store knowledge provenance Taverna workflow workbench & plugins Ensure automated recording

myGrid information model OGSA-Distributed Query Processing Results management LSID mIR Scufl Workflows + Taverna Workflow Workbench Portal & Application tools e-Science coordination e-Science mediator e-Science process patterns e-Science events Notification service Components designed to work together myGrid information model Metadata & provenance management using semantics KAVE Text Mining Services Termino Publication and Discovery using semantics Feta Pedro Ontology Service management

KAVE Data and metadata management Life Science Identifiers Information Model File management Support for custom database building Provenance metadata capture using RDF SRB integration OGSA-DAI integration urn:data:f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data12 Blast_report [input] [output] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2…. urn:hit1… urn:hit50….. [similar_sequence_to] Data generated by services/workflows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10….. urn:BlastNInvocation3 urn:invocation5 urn:data:f1 New sequence Missed sequence [hasName] literals DatumCollection [type] LSDatum Properties [directlyDerivedFrom]

Provenance Browsing in Taverna KAVE stores provenance data as RDF – not user friendly Follows the plugin architecture

Results Integration Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results Provenance Record Custom Data Model Input Result Smarter workflow design incorporating visualisation VBI collaboration

Visualisation SeqVista Utopia

Applications Resistance to trypanosomiasis in cattle in Kenya Andy Brass, Paul Fisher – University of Manchester Microarray QTL SNPs Metabolic pathway analysis Need to access microarray data, genomic sequence information, pathway databases AND integrate the results

myGrid Alliance: Applications Large user community – over 15600 downloads PsyGrid Small molecules, Murray-Rust, Cambridge Mias-Grid Chicken genome Roslin Institute

Workflow Reuse Addisons Disease SNP design Protein annotation Microarray analysis

Taverna is now OMII-UK Taverna 1.3.1 production Sept 2006 Packaging, Installation, Deployment, Maintenance, Testing GridSAM, GRIMOIRES, BioMOBY integration Semantic content for registry Smoothed integration of discovery and metadata management Security AA for KAVE data and metadata management Taverna 2.0 Spring 2007 Redevelopment of the plug in and enactor framework, improved iteration events, data management Close collaboration with pioneers Incremental rollouts to early adopters Workflows: nested workflows as first-class citizens, recursion, improve iteration events. Enactor: rewrite enactor to deal with scalability issues and decouple it so that it can run as a server. GUI: connect windows, make them dockable, interact with workflow diagram window. Semantics: integrate with BioMoby and GRIMOIRES, align FETA gui to taverna's and provenance plugin.

Taverna in OMII-UK Development of Taverna 2.0 reworking of the processor model to include duel execution semantics incorporating data and control flow enhanced support for long-running workflows large scale data transfer improved provenance collection with nested workflows and complex iterations fully distributed workflow enactment and authoring

Acknowledgements Carole Goble and the myGrid team OMII-UK All of our users