Download presentation
Presentation is loading. Please wait.
Published byFay Freeman Modified over 9 years ago
1
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the my Grid team.
2
2 Open Source Upper Middleware for Bioinformatics (Web) Service-based architecture Targeted at Tool Developers, Bioinformaticians and Service Providers Newcastle Nottingham Manchester Southampton Hinxton Sheffield
3
3 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick- Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker
4
4 Roadmap - start services data
5
5 Philosophy Openness –open source –open world of services –open to wider eScience context –open to user feedback –open to third party metadata Collection of components for assembly –Pick and mix
6
6 Tenet I High level Middleware services for data intensive resource interoperation for Bioinformatics –Information Grid not computational Grid Exploratory, ad hoc For individuals In silico experiment as workflow Distributed query processing Information Management
7
7 Tenet II High level services for e-Science experimental management; –Provenance –Event notification –Personalisation Sharing knowledge and sharing components –Scientific discovery is personal & global. –Federated third party registries for workflows and services –Workflow and service discovery for reuse and repurposing Registry Register Find Annotate
8
8 Tenet III Open Source and Open Services –No control or influence over service providers Open to third party metadata and services Open extensible architecture –Assemble your own components –Designed to work together –Toolkit Freefluo WfEE Taverna View UDDI registry Event Notification mIR Pedro Semantic Discovery Info. Model Soaplab Gateway & Portal LSID Haystack Provenance Browser
9
9 Tenet IV (Web) Service architecture –Publication, discovery, interoperation, composition, decommissioning of my Grid services –WS-I -> OGSA / WSRF Metadata driven –Ontologies –Common information model –Semantic Web technologies RDF, OWL
10
10 Tenet V Middleware for Tool Developers Bioinformaticians Service Providers Biologists are indirectly supported by the portals and apps these develop.
11
11 Roadmap run workflows services workflows data discover services data management workflows
12
12 Data-intensive bioinformatics ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
13
13 Use Scenarios Graves’ Disease Autoimmune disease of the thyroid Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Discover all you can about a gene Annotation pipelines and Gene expression analysis Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK
14
14 Manually filling a genomic gap Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc… Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis
15
15 WBS Workflows: GenBank Accession No GenBank Entry Seqret Nucleotide seq (Fasta) GenScanCoding sequence ORFs prettyseq restrict cpgreport RepeatMasker ncbiBlastWrapper sixpack transeq 6 ORFs Restriction enzyme map CpG Island locations and % Repetative elements Translation/sequence file. Good for records and publications Blastn Vs nr, est databases. Amino Acid translation epestfind pepcoil pepstats pscan Identifies PEST seq Identifies FingerPRINTS MW, length, charge, pI, etc Predicts Coiled-coil regions SignalP TargetP PSORTII InterPro PFAM Prosite Smart Hydrophobic regions Predicts cellular location Identifies functional and structural domains/motifs Pepwindow? Octanol? ncbiBlastWrapper URL inc GB identifier tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr RepeatMasker Query nucleotide sequence ncbiBlastWrapper Sort for appropriate Sequences only Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns RepeatMasker
16
16 Graves’ Disease Bioinformatics Annotation Pipeline What is known about my candidate gene? Medline OMIM GO BLAST EMBL DQP Query Genotype Assay Design System3D Protein Structure Is this SNP present in my samples? What is the structure of the protein product encoded by my candidate gene? Primer Design Gene ID Restriction Fragment Length Polymorphism experiment SNP SN P P Use primers designed by my Grid to amplify region flanking SNP on the gene PDB Query PDB & display protein structure Obtain information about protein & extract information about active site Swiss-Prot AMBITInterpro Emboss Eprimer application in SoapLab Selection of restriction enzyme Talisman SNP Emboss Restrict in SoapLab AMBIT Determine whether coding SNP affects the active site of the protein Peter Li 1, Claire Jennings 2, Simon Pearce 2 and Anil Wipat 1, (2003) 1 School of Computing Science and 2 Institute of Human Genetics, University of Newcastle-upon-Tyne. Candidate gene pool
17
17 Experiment life cycle Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisation Forming experiments Executing and monitoring experiments
18
18 (e-)Scientists… …Experiment Can workflow be used as an experimental method? How many times has this experiment been run? …Analyze How do we manage the results to draw conclusions from them? How reliable are these results? …Collaborate Can we share workflows, results, metadata etc? …Publish Can we link to these workflows and results from our papers? …Review Can I find, comprehend and review your work? How was that result derived?
19
19 Collections of Tasks Finding Description Service Discovery Enactment Building Workflow Provenance Storage Data Management Querying Domain Tasks Service Providers Bioinformaticians Scientists Annotation providers
20
20 Registry mIR Discovery View Haystack Provenance Browser FreeFluo Enactor Taverna WF Builder Pedro Annotation tool Ontology Store Others WSDL Soap- lab Interface Description Annotation/description Annotation providers Query & Retrieve Workflow Execution Store data/ knowledge Scientists Bioinformaticians invoking Querying/sharing/ federating/registering Service Providers Data descriptions Vocabulary
21
21 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab
22
22 Two+ Paths Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway
23
23 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab
24
24
25
25 Run the Workflow Viewing intermediate results
26
26 Run the Workflow
27
27 Drilling Down: my Grid and Semantics Workflow and service discovery –Prior to and during enactment –Semantic registration Workflow assembly –Semantic service typing of inputs and outputs Provenance of workflows and other entities Experimental metadata glue Use of RDF, RDFS, DAML+OIL/OWL –Instance store, ontology server, reasoner –Materialised vs at point of delivery reasoning. my Grid Information Model
28
28 Semantic Discovery View annotations on workflow Pedro data capture tool Drag a workflow entry into the explorer pane and the workflow loads. Drag a service/ workflow to the scavenger window for inclusion into the workflow
29
29 Tutorial focus Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway
30
30 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows
31
31 Sessions on Details Workflows - hands on with Taverna Semantics Timetable – split sessions –Session 1 Group 1 – hands on (Swanson) Group 2 – semantics (Newhaven) –Teabreak (short) –Session 2 Group 1 – semantics (Newhaven) Group 2 –hands on (Swanson) –Discussions and Conclusions
32
32 Questions? http://www.mygrid.org.uk http://taverna.sf.net http://freefluo.sf.net/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.