1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

Slides:



Advertisements
Similar presentations
The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
ISMB Demo; June 27, 2005 Integrating Text Mining into Bio-Informatics Workflows Neil Davis George Demetriou Robert Gaizauskas Yikun Guo Ian Roberts Henk.
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK
An integrative approach for attaching semantic annotations to service descriptions Luc Moreau, University of Southampton,UK.
Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.
GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
1 Middleware for In silico Biology Phillip Lord
Migrating to the Semantic Web: Bioinformatics as a case study.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Taverna and my Grid Basic overview and Introduction Tom Oinn
High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
Standards and Ontologies to Enable Discovery Data and Information Integration Robin McEntire GlaxoSmithKline 19 Nov, 2002.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 The myGrid Project Professor Chris Greenhalgh University of Nottingham.
The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA.
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Semantic Mediation in myGrid Chris Wroe Manchester University.
High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome.
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
E-Science Process. Thoughts on the e-Science Mediator in myGrid M.Nedim Alpdemir.
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Katy Wolstencroft University of Manchester
Provenance: Problem, Architectural issues, Towards Trust
Functional Annotation of the Horse Genome
A myGrid Project Tutorial
Presentation transcript:

1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the my Grid team.

2 Open Source Upper Middleware for Bioinformatics (Web) Service-based architecture Targeted at Tool Developers, Bioinformaticians and Service Providers Newcastle Nottingham Manchester Southampton Hinxton Sheffield

3 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick- Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

4 Roadmap - start services data

5 Philosophy Openness –open source –open world of services –open to wider eScience context –open to user feedback –open to third party metadata Collection of components for assembly –Pick and mix

6 Tenet I High level Middleware services for data intensive resource interoperation for Bioinformatics –Information Grid not computational Grid Exploratory, ad hoc For individuals In silico experiment as workflow Distributed query processing Information Management

7 Tenet II High level services for e-Science experimental management; –Provenance –Event notification –Personalisation Sharing knowledge and sharing components –Scientific discovery is personal & global. –Federated third party registries for workflows and services –Workflow and service discovery for reuse and repurposing Registry Register Find Annotate

8 Tenet III Open Source and Open Services –No control or influence over service providers Open to third party metadata and services Open extensible architecture –Assemble your own components –Designed to work together –Toolkit Freefluo WfEE Taverna View UDDI registry Event Notification mIR Pedro Semantic Discovery Info. Model Soaplab Gateway & Portal LSID Haystack Provenance Browser

9 Tenet IV (Web) Service architecture –Publication, discovery, interoperation, composition, decommissioning of my Grid services –WS-I -> OGSA / WSRF Metadata driven –Ontologies –Common information model –Semantic Web technologies RDF, OWL

10 Tenet V Middleware for Tool Developers Bioinformaticians Service Providers Biologists are indirectly supported by the portals and apps these develop.

11 Roadmap run workflows services workflows data discover services data management workflows

12 Data-intensive bioinformatics ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC ) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE BINDS PEP (BY SIMILARITY). FT CONFLICT S -> A (IN REF. 3). SQ SEQUENCE 429 AA; MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

13 Use Scenarios Graves’ Disease Autoimmune disease of the thyroid Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Discover all you can about a gene Annotation pipelines and Gene expression analysis Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK

14 Manually filling a genomic gap Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc… Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis

15 WBS Workflows: GenBank Accession No GenBank Entry Seqret Nucleotide seq (Fasta) GenScanCoding sequence ORFs prettyseq restrict cpgreport RepeatMasker ncbiBlastWrapper sixpack transeq 6 ORFs Restriction enzyme map CpG Island locations and % Repetative elements Translation/sequence file. Good for records and publications Blastn Vs nr, est databases. Amino Acid translation epestfind pepcoil pepstats pscan Identifies PEST seq Identifies FingerPRINTS MW, length, charge, pI, etc Predicts Coiled-coil regions SignalP TargetP PSORTII InterPro PFAM Prosite Smart Hydrophobic regions Predicts cellular location Identifies functional and structural domains/motifs Pepwindow? Octanol? ncbiBlastWrapper URL inc GB identifier tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr RepeatMasker Query nucleotide sequence ncbiBlastWrapper Sort for appropriate Sequences only Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns RepeatMasker

16 Graves’ Disease Bioinformatics Annotation Pipeline What is known about my candidate gene? Medline OMIM GO BLAST EMBL DQP Query Genotype Assay Design System3D Protein Structure Is this SNP present in my samples? What is the structure of the protein product encoded by my candidate gene? Primer Design Gene ID Restriction Fragment Length Polymorphism experiment SNP SN P P Use primers designed by my Grid to amplify region flanking SNP on the gene PDB Query PDB & display protein structure Obtain information about protein & extract information about active site Swiss-Prot AMBITInterpro Emboss Eprimer application in SoapLab Selection of restriction enzyme Talisman SNP Emboss Restrict in SoapLab AMBIT Determine whether coding SNP affects the active site of the protein Peter Li 1, Claire Jennings 2, Simon Pearce 2 and Anil Wipat 1, (2003) 1 School of Computing Science and 2 Institute of Human Genetics, University of Newcastle-upon-Tyne. Candidate gene pool

17 Experiment life cycle Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisation Forming experiments Executing and monitoring experiments

18 (e-)Scientists… …Experiment Can workflow be used as an experimental method? How many times has this experiment been run? …Analyze How do we manage the results to draw conclusions from them? How reliable are these results? …Collaborate Can we share workflows, results, metadata etc? …Publish Can we link to these workflows and results from our papers? …Review Can I find, comprehend and review your work? How was that result derived?

19 Collections of Tasks Finding Description Service Discovery Enactment Building Workflow Provenance Storage Data Management Querying Domain Tasks Service Providers Bioinformaticians Scientists Annotation providers

20 Registry mIR Discovery View Haystack Provenance Browser FreeFluo Enactor Taverna WF Builder Pedro Annotation tool Ontology Store Others WSDL Soap- lab Interface Description Annotation/description Annotation providers Query & Retrieve Workflow Execution Store data/ knowledge Scientists Bioinformaticians invoking Querying/sharing/ federating/registering Service Providers Data descriptions Vocabulary

21 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab

22 Two+ Paths Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway

23 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab

24

25 Run the Workflow Viewing intermediate results

26 Run the Workflow

27 Drilling Down: my Grid and Semantics Workflow and service discovery –Prior to and during enactment –Semantic registration Workflow assembly –Semantic service typing of inputs and outputs Provenance of workflows and other entities Experimental metadata glue Use of RDF, RDFS, DAML+OIL/OWL –Instance store, ontology server, reasoner –Materialised vs at point of delivery reasoning. my Grid Information Model

28 Semantic Discovery View annotations on workflow Pedro data capture tool Drag a workflow entry into the explorer pane and the workflow loads. Drag a service/ workflow to the scavenger window for inclusion into the workflow

29 Tutorial focus Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway

30 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows

31 Sessions on Details Workflows - hands on with Taverna Semantics Timetable – split sessions –Session 1 Group 1 – hands on (Swanson) Group 2 – semantics (Newhaven) –Teabreak (short) –Session 2 Group 1 – semantics (Newhaven) Group 2 –hands on (Swanson) –Discussions and Conclusions

32 Questions?