Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

Similar presentations


Presentation on theme: "1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole."— Presentation transcript:

1 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the my Grid team.

2 2 Open Source Upper Middleware for Bioinformatics (Web) Service-based architecture Targeted at Tool Developers, Bioinformaticians and Service Providers Newcastle Nottingham Manchester Southampton Hinxton Sheffield

3 3 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick- Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

4 4 Roadmap - start services data

5 5 Philosophy Openness –open source –open world of services –open to wider eScience context –open to user feedback –open to third party metadata Collection of components for assembly –Pick and mix

6 6 Tenet I High level Middleware services for data intensive resource interoperation for Bioinformatics –Information Grid not computational Grid Exploratory, ad hoc For individuals In silico experiment as workflow Distributed query processing Information Management

7 7 Tenet II High level services for e-Science experimental management; –Provenance –Event notification –Personalisation Sharing knowledge and sharing components –Scientific discovery is personal & global. –Federated third party registries for workflows and services –Workflow and service discovery for reuse and repurposing Registry Register Find Annotate

8 8 Tenet III Open Source and Open Services –No control or influence over service providers Open to third party metadata and services Open extensible architecture –Assemble your own components –Designed to work together –Toolkit Freefluo WfEE Taverna View UDDI registry Event Notification mIR Pedro Semantic Discovery Info. Model Soaplab Gateway & Portal LSID Haystack Provenance Browser

9 9 Tenet IV (Web) Service architecture –Publication, discovery, interoperation, composition, decommissioning of my Grid services –WS-I -> OGSA / WSRF Metadata driven –Ontologies –Common information model –Semantic Web technologies RDF, OWL

10 10 Tenet V Middleware for Tool Developers Bioinformaticians Service Providers Biologists are indirectly supported by the portals and apps these develop.

11 11 Roadmap run workflows services workflows data discover services data management workflows

12 12 Data-intensive bioinformatics ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

13 13 Use Scenarios Graves’ Disease Autoimmune disease of the thyroid Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Discover all you can about a gene Annotation pipelines and Gene expression analysis Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK

14 14 Manually filling a genomic gap Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc… Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis

15 15 WBS Workflows: GenBank Accession No GenBank Entry Seqret Nucleotide seq (Fasta) GenScanCoding sequence ORFs prettyseq restrict cpgreport RepeatMasker ncbiBlastWrapper sixpack transeq 6 ORFs Restriction enzyme map CpG Island locations and % Repetative elements Translation/sequence file. Good for records and publications Blastn Vs nr, est databases. Amino Acid translation epestfind pepcoil pepstats pscan Identifies PEST seq Identifies FingerPRINTS MW, length, charge, pI, etc Predicts Coiled-coil regions SignalP TargetP PSORTII InterPro PFAM Prosite Smart Hydrophobic regions Predicts cellular location Identifies functional and structural domains/motifs Pepwindow? Octanol? ncbiBlastWrapper URL inc GB identifier tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr RepeatMasker Query nucleotide sequence ncbiBlastWrapper Sort for appropriate Sequences only Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns RepeatMasker

16 16 Graves’ Disease Bioinformatics Annotation Pipeline What is known about my candidate gene? Medline OMIM GO BLAST EMBL DQP Query Genotype Assay Design System3D Protein Structure Is this SNP present in my samples? What is the structure of the protein product encoded by my candidate gene? Primer Design Gene ID Restriction Fragment Length Polymorphism experiment SNP SN P P Use primers designed by my Grid to amplify region flanking SNP on the gene PDB Query PDB & display protein structure Obtain information about protein & extract information about active site Swiss-Prot AMBITInterpro Emboss Eprimer application in SoapLab Selection of restriction enzyme Talisman SNP Emboss Restrict in SoapLab AMBIT Determine whether coding SNP affects the active site of the protein Peter Li 1, Claire Jennings 2, Simon Pearce 2 and Anil Wipat 1, (2003) 1 School of Computing Science and 2 Institute of Human Genetics, University of Newcastle-upon-Tyne. Candidate gene pool

17 17 Experiment life cycle Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisation Forming experiments Executing and monitoring experiments

18 18 (e-)Scientists… …Experiment Can workflow be used as an experimental method? How many times has this experiment been run? …Analyze How do we manage the results to draw conclusions from them? How reliable are these results? …Collaborate Can we share workflows, results, metadata etc? …Publish Can we link to these workflows and results from our papers? …Review Can I find, comprehend and review your work? How was that result derived?

19 19 Collections of Tasks Finding Description Service Discovery Enactment Building Workflow Provenance Storage Data Management Querying Domain Tasks Service Providers Bioinformaticians Scientists Annotation providers

20 20 Registry mIR Discovery View Haystack Provenance Browser FreeFluo Enactor Taverna WF Builder Pedro Annotation tool Ontology Store Others WSDL Soap- lab Interface Description Annotation/description Annotation providers Query & Retrieve Workflow Execution Store data/ knowledge Scientists Bioinformaticians invoking Querying/sharing/ federating/registering Service Providers Data descriptions Vocabulary

21 21 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab

22 22 Two+ Paths Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway

23 23 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab

24 24

25 25 Run the Workflow Viewing intermediate results

26 26 Run the Workflow

27 27 Drilling Down: my Grid and Semantics Workflow and service discovery –Prior to and during enactment –Semantic registration Workflow assembly –Semantic service typing of inputs and outputs Provenance of workflows and other entities Experimental metadata glue Use of RDF, RDFS, DAML+OIL/OWL –Instance store, ontology server, reasoner –Materialised vs at point of delivery reasoning. my Grid Information Model

28 28 Semantic Discovery View annotations on workflow Pedro data capture tool Drag a workflow entry into the explorer pane and the workflow loads. Drag a service/ workflow to the scavenger window for inclusion into the workflow

29 29 Tutorial focus Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway

30 30 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows

31 31 Sessions on Details Workflows - hands on with Taverna Semantics Timetable – split sessions –Session 1 Group 1 – hands on (Swanson) Group 2 – semantics (Newhaven) –Teabreak (short) –Session 2 Group 1 – semantics (Newhaven) Group 2 –hands on (Swanson) –Discussions and Conclusions

32 32 Questions? http://www.mygrid.org.uk http://taverna.sf.net http://freefluo.sf.net/


Download ppt "1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole."

Similar presentations


Ads by Google